key: cord- -czjjnlkf authors: moon, sanghoon; byun, yanga; han, kyungsook title: prediction of ribosomal - frameshifts in the escherichia coli k genome date: journal: computational intelligence and bioinformatics doi: . / _ sha: doc_id: cord_uid: czjjnlkf ribosomal frameshifting at a particular site can yield two protein products from one coding sequence or one protein product from two overlapping open reading frames. many organisms are known to utilize ribosomal frameshifting to express a minority of genes. however, finding ribosomal frameshift sites by a computational method is difficult because frameshift signals are diverse and dependent on the organisms and environments. there are few computer programs available for public use to identify frameshift sites from genomic sequences. we have developed a web-based application program called fsfinder for predicting frameshift sites of general type. we tested fsfinder on the escherichia coli k genome to detect potential - frameshifting genes. from the genome sequence, we identified , frameshift sites following the x xxy yyz motif. , frameshift sites out of the , sites include secondary structures. comparison with the genbank annotation produced potential frameshift sites, including known frameshift sites. the program is useful for analyzing frameshifts of various types and for discovering new genes expressed by frameshifts. ribosomes in general terminate translation at three kinds of stop codons (uag, uga and uaa), but some ribosomes continue to decode after the stop codons. this alternative translational event is called 'recoding'. recoding events include frameshifting, read-through and bypassing [ ] [ ] [ ] . in frameshifting, ribosomes shift reading frame by one or more nucleotides at a specific mrna signal between overlapping genes [ ] . frameshifts are classified into different types depending on the number of nucleotides shifted and the shifting direction. the most common type is a - frameshift, in which the ribosome slips a single nucleotide in the upstream direction. - frameshifting requires a frameshift cassette that consists of a slippery site, a stimulatory rna structure and a spacer. + frameshifts are much less common than - frameshifts, but have been observed in diverse organisms [ ] . no program exists to predict general types of frameshift. in addition, existing computational models predict too many false positives. in previous work we developed a program called fsfinder (frameshift signal finder) for predicting - and + frameshift sites [ ] . that program is written in microsoft c# and is executable on windows systems only. to remove these limitations and to handle frameshifts of general type, we developed a new web-based application called fsfinder . users can predict frameshift sites of any type online from any web browser and operating system. in previous experimental results of testing fsfinder on ~ genomic and partial dna sequences showed that it predicted frameshift sites efficiently and with greater sensitivity and specificity than other programs, because it focused on the overlapping regions of orfs and prioritized candidate signals (for - frameshifts, sensitivity was . and specificity . ; for + frameshifts, sensitivity was . and specificity . ) [ ] [ ] [ ] . using the web service of the fsfinder , we analyzed the escherichia coli (e. coli) k genome sequence to find the - frameshifting genes with high probability. from the e. coli k genome sequence, we found , frameshift sites after the x xxy yyz motif. among these sequences, , frameshift sites included secondary structure such as pseudoknots or stem-loops. using the genbank description we found overlapping regions of two genes with more than base. using fsfinder we found overlapping regions with more than bases. after removing redundant ones, we obtained overlapping regions and found potential frameshift sites in the overlapping regions. among these sites, sites including known frameshift sites were considered significant based on the gene length, shape and the length of overlapping region. we believe that at least new frameshift sites are highly likely to be frameshift sites. the cassettes of - frameshift consist of three parts: slippery site, spacer and secondary structure. the slippery site is usually a heptameric sequence in the form x xxy yyz (in the incoming -frame), where x, y and z can be same nucleotides [ ] [ ] [ ] . the spacer is a short sequence of to nucleotides separating slippery site and the downstream secondary structure. the downstream structure is usually a pseudoknot or simple stem-loop, as shown in fig. . for analyzing frameshift sites in the e. coli genome sequence, we detected heptameric sequence with a secondary structure. in previous work by others [ ] [ ] [ ] , frameshift sites have two constraints. in slippery site x xxy yyz, x is any nucleotide, y is a or u, and z is either a, u or c. however, in our work, any kind of nucleotide with secondary structure can be located in the slippery sequence. fig. shows the parameters for the fsfinder web service. the web service and web application of fsfinder were implemented using xml, xslt and javascript. if the user sends a query to the fsfinder server after setting parameters or defining a new model, all the computations are done on the server side. after computation, the server sends the results to the user. a programmed - ribosomal frameshift signal with h-type pseudoknot. stem has base pairs, stem has base pairs, and both loops of the pseudoknot have nucleotides. in particular, any nucleotide can be located in the slippery site not the same as previous work. the user can define a new model by specifying its components and their locations. to analyze e. coli genome, we set the selection option as other genes in bacteria, partial sequence and + strand. because we focused on - frameshift model, we made a model which is the general motif x xxy yyz of the - frameshift. there was no limitation of frameshift site, thus any kind of nucleotide can be located in the frameshift site. for downstream secondary structure, the length of the stems and loops set by default. to analyze e. coli genome, we set the selection option as other genes in bacteria, partial sequence and + strand ( fig. a) . because we focused on - frameshift sites (fig. b) , we defined a signal that fits all kinds of motif x xxy yyz of the - frameshift. there was no limitation of frameshift site, thus any kind of nucleotide can be located in the frameshift site. thus we set the match type as nnn (n is any nucleotide) and no exception of arrangement. that is, frameshift site can occur even at a aaa aaa or u uuu uuu. default values were used for the lengths of the stems and loops in the downstream secondary structure (fig. c ). fig. shows the xml schema for the parameters of fsfinder . when the user sets the parameters of a model in the web page of the fsfinder web service, the parameters are converted to xml. after the request of a web service is sent to the fsfinder server, all the computations are done on the server side. fig. shows the frameshift site of the genome of sars corona virus (nc_ ). orf a ( .. ) and orf b ( .. ) partially overlap each other. u uua aac is the slippery sequence and the secondary structure is a pseudoknot [ ] . in the genbank description, the overlapping region of orf a and orf b has only one nucleotide. orf b starts from codon aaa instead of a regular start codon. the start codon of orf (red triangle) exists outside the overlapping region. frameshifting can occur even in overlapping regions with nucleotide. to solve these problems, finding an overlapping region is divided into two processes: finding motifs and finding overlapping regions. the shape of an overlapping region is considered to find an overlapping region. as shown in fig. , the overlapping region is extended from stop codon of open reading frame to stop codon of open reading frame (green color). the minimum length of both genes should be longer than nucleotides, and the length of the overlapping region should be longer than nucleotides. in addition, the location of the start codon of orf does not matter. as in the sars corona virus, the start codon is not always located in the overlapping region of frameshifting genes. additional method to find overlapping regions is to use the description of the gen-bank file. if more than nucleotide is overlapped, we consider two open reading frames as a candidate of partially overlapping genes. finally, the overlapping regions found both by fsfinder and by the genbank description are used for further analysis. the e. coli k genome sequence (nc_ ) was obtained from genbank. as shown in fig. , all the heptameric sequences that follow x xxy yyz motif were fig. . the prediction process of frameshift sites from the escherichia coli k genome sequence examined. as a result of this process , sites were found. whether these sites are located in the overlapping region or not was not considered when finding these sites. for overlapping regions of orfs, we found candidate regions that were partially overlapped more than base according to the genbank description and candidate regions that were overlapped more than bases using the fsfinder web service. after removing redundant sites, we obtained overlapping regions. from these regions, heptameric sequences that were not located in the overlapping region were filtered out. just sites remained in the overlapping regions. from the gene length, shape and the length of the overlapping region, sites including known frameshift sites were identified as significant candidates. this process consists of two sub-processes: finding motifs and finding overlapping regions. the e. coli k genome sequence (nc_ ) was obtained from the genbank. to find motifs, all the heptameric sequences that follow x xxy yyz motif were found. , sites were detected. according to the genbank description, there were candidate overlapping regions that were partially overlapped more than base in. the fsfinder web service found candidate overlapping regions that were overlapped more than bases. after removing redundant sites, we obtained overlapping regions. after further removing heptameric sequences that were not located in the overlapping region, sites remained in the overlapping regions. considering the gene length, shape and length of the overlapping region, frameshift sites including known sites were considered significant candidates. table shows the names of overlapping genes, the locations of the slippery sites, slippery sequences and number of overlapped nucleotides from our analysis. the three genes, yi _ -yi _ , yi _ -yi _ , yi _ -yi _ , are known genes expressed via - table . the predicted frameshift sites in the e. coli k genome sequence. the first threegenes marked with * symbol are those with known frameshift sites. pk represents a pseudoknot. frameshifts. these three genes and insa_ -insb_ are insertion sequences. we believe that entb-enta, teha-tehb, and xdhb-xdhc have a high probability using frameshift events. all these seven genes have either pseudoknot or stem loop as a downstream secondary structure. the rest four genes, mhpd-mhpf, flim-flin, atos-atoc and yijc-yijd, have no downstream secondary structures and have a lower probability of frameshifting than the other genes. there exist previous works similar to our approach. hammell et al. [ ] studied - ribosomal frameshift signals in the large databases. using their well-established model, they found that - frameshifts occur with frequencies from two-to six-fold greater than random. they considered the nucleotides of a frameshift site, spacer, and pseudoknot. however, they focused on the - frameshifts only. bekaert et al. [ ] performed a computational method similar to hammell's approach. from their model, they designed a model for - eukaryotic frameshifting. but these two programs are not available for public use. the freqanalysis [ ] is an application program available for public use. freqanalysis was implemented in the java language and can find putative translational frameshift from probabilistic calculation. in fact, shah et al. [ ] found putative frameshift sites from the sacharomyces cerevisiae orfs. but freqanalysis does not consider the x xxy yyz frameshift motif. thus it is hard to compare the results of freqanalysis with ours directly. fig. shows the result of the fsfinder . two rectangles filled with sky blue color in the upper window represent i _ and yi _ , respectively. the i _ gene located in frame and the yi _ gene located in - frame are partially overlapped. when translational frameshift occurs, translation continues past the stop codon at , until the stop codon at , . ribosomal frameshifting is unusual event which is known to affect producing heterogeneous proteins, auto-regulation. prediction of frameshift sites is very difficult. on the other hand if prediction is possible, this is very useful to understand of biological phenomenon. and unveiling of unknown protein production mechanisms can be realized. for this valuable advantage, we develop a web application and serve web service for prediction of frameshift sites. using the fsfinder, we analyzed escherichia coli k genome sequence to detect potential - frameshifting genes. from the e. coli k genome sequence, we have got , frameshift sites followed x xxy yyz motif. among these sequences, , frameshift sites included secondary structure. comparing with genbank description, we have got sequences including known frameshift sites. among these sequences, we believe that at least sequences have a high probability to use frameshift event and all these sites have a downstream secondary structure such as pseudoknot and stem loops. other gene do not have downstream structure, but we believe that these sequences have less probabilities than above genes but significant. programmed translational frameshifting recoding: dynamic reprogramming of translation one protein from two open reading frames: mechanism of a nttranslational bypass recoding: translational bifurcations in gene expression predicting genes expressed via - and + frameshifts towards a computational model for - eukaryotic frameshifting sites identification of putative programmed - ribosomal frameshift signals in large dna databases programmed - ribosomal frameshifting in the sars coronavirus computational identification of putative programmed translational frameshift sites. bioinformatics key: cord- -t qxscbp authors: losvik, mary h. title: plant species diversity in an old, traditionally managed hay meadow compared to abandoned hay meadows in southwest norway date: - - journal: nord doi: . /j. - . .tb .x sha: doc_id: cord_uid: t qxscbp a chronosequence, representing a successional series, was used for the comparison of a hay meadow site managed in an old traditional way for at least a hundred years, and hay meadow sites abandoned for about , and years, respectively. old traditional management included grazing early and late in the growing season, mowing in august and light or no fertilizing. the tree cover was the most important factor deciding the composition of vegetation. time since abandonment was not completely correlated to tree cover, as some plots had a dense canopy and others were situated in the open. the total species number decreased with number of years since abandonment in plots > . m( )and < m( ). the highest species number in m( ) plots was recorded in the managed site, with species of phanerogams. fourtyeight % of the indicators of traditional management present in the managed site was recorded in the site which had been abandoned for years. frequency — log area curves made it possible to group species according to persistence in the sward. as a result, a group of functional indicators of rare hay meadows in the region was distinguished. at small scales, temperate grasslands are the most species-rich plant communities in the world shmida & ellner ) . grasslands were once widespread in western europe, but are now greatly reduced in extent as a result from changes in agricultural land-use practices (keymer & leach ; willems ) . the grasslands may have high species numbers, up to well above per m (willems (willems , kull & zobel ) . old, traditionally managed hay meadows are rare in western norway at present ( vstedal ; lundekvam & gauslaa ; losvik a losvik , b losvik , a rosef ) . earlier they were common elements of the agricultural landscape, but most of them are now either cultivated in a modern way or abandoned (losvik a ). the few old, traditionally managed hay meadows which are left, are mostly situated on phyllittic soils and have high species diversity. no investigation in the region has so far studied what happen to the species diversity and species composition when such meadows are abandoned. in this study the species have been grouped into indicators of traditional management, common (hay meadow) species and additional species, the latter includes forest species. the species group indicators of traditional management was preliminary defined in losvik ( b) , and later (losvik b ) was shown to comprise species occurring in less than yo of plots ( - m ) of hay meadow vegetation analysed in western norway by different authors - . these species are often low-statured, creeping, slender andor with rosettes, and mostly stresstolerators (grime (grime , ) and hemicryptophytes, annuals or biennials (raunkiaer ) . the number of the indicators is usually low in abandoned sites (see data in losvik ; persson ; austad & skogen ; borgegaard & persson ) . modern management, including medium or heavy fertilizing, also reduces species number in general (see e. g. during & willems ; willems et al. ) , and especially the indicators of traditional management (losvik b (losvik , a (losvik , b . as a consequence, indicators of traditional management are becoming increasingly more rare in the region. common species are defined as being more frequent than indicators of traditional management in the agricultural landscape, tolerating medium quantities of fertilizer. these groups of species is considered to be more useful in choosing areas for hay meadow conservation, than the plain use of total number of species, as it reveals direct information on the number of target species, namely the indicators of traditional management in the sites. the changes in plant species richness during succession have been studied e. g. by nicholson & monk ( ) , bazzaz ( ) , prach ( ) and symonides ( ) . both increase and decrease in total plant species diversity have been found in these studies, and the situation may be further complicated by environmental gradients (peet ; prach ) . as the species diversity depends on the spatial scale considered (e. g. kwiatkowska & symonides ; van der maarel a) it may be better described by the species-area relationship than by single species numbers. the aim of the study was to compare the vegetation and ecology of sites which formed a chronosequence, representing a successional series from a species-rich, old, traditionally managed hay-meadow to sites which had been abandoned for about , and years, respectively. themes of special interest were: how important are the ecological factors years since abandonment and cover of tree canopy in deciding the composition of the vegetation, what differences in species diversity, as measured by species-area curves, are there between the sites, for how long are species which are characteristic for the agricultural landscape able to persist during overgrowing in such sites, and finally, what forest species and additional species are most important in the sites of this series of abandoned haymeadows. the study sites are located in a hill-side at gjuvsland, varaldsey in the hardanger fjord, western norway, generally sloping " towards south-east, from about m.a.s. . down to the shore. the hill-side was formerly common land, used as out-lying hay meadows. they were mown in august and grazed in spring and in autumn by sheep and cattle from farms at gjuvsland. in the hill-side was divided between the farms into parts which each was fenced. four of these parts constituted the study sites. the bedrock is volcanic supercrustal rocks covered by scree material containing phyllite (foslie ; see also kolderup ; askvik ). the soil is probably enriched with ca through seepage from limestone bedrock occurring above the hay meadows. the mean precipitation at the nearest meteorological station rosendal is mm year' (ferland ) . the mean temperature in january is . " c, in july . " c, and mean yearly temperature is . " c (at the nearest meteorological station measuring temperature, omastrand, aune ) . traditional management includes grazing by sheep for about days in mayijune and in september/october each year. until about years ago, mixed grazing by cattle and sheep was common, but later sheep alone were used. stocking rates are about sheepgrazingdaysha and year (including lambs), less in dry years. after the spring grazing period, dried leftovers of manure and twigs of trees are removed by raking. the grass is mown in august, using a small reaper and different forms of scythes. august may be rainy (semme ) , but the farmers usually awaits a period of dry weather to allow drying of the grass on the ground. the grass is turned on the spot twice a day till it is dry, allowing seeds to be spread evenly thoughout the grassland. usually no commercial fertilizers or wintermanure are applied, but some nutrients are recycled through urine and faeces during grazing, particularly when mixed grazing, involving higher grazing intensity than at present, was practised (brelin ) . the main features of these management practices were common in western norwegian out-lying hay meadows till about (byrkjeland ) . as long as the whole hillside was managed, there was a scattered treelayer of pollards (mostly fraxinus excelsior) or coppiced trees (corylus avellana and alnus incana). these trees and characteristic forest herbs and grasses often grew near outcrops or heaps of rocks, cleared away from the meadows, and functioned as centres of natural afforestation as management ceased. at the study site this has resulted in a mosaic-like pattern of patches with ( ) a high, dense sward of grasses and herbs, lacking a tree layer, ( ) a scattered field layer and a more or less closed canopy of trees, and ( ) a low dense field layer in areas in the open which are situated close to small outcrops, narrow paths, ant-heaps and springs. site a is managed in the old, traditional way (a. a. gjuvsland, pers. comm.) . but since about the number of sheep has been reduced to about on third. this reduction has, at least partly, been compensated for by longer grazing periods. the low-lying part (about %) of site a was lightly fertilized (about kg fertilized ha.year, type varied) in june from about till (in the ties at most kg fertilizer/ha.year). the soil is rich in minerals (base saturation: - %), and poor in organic matter (c: . - . %), nitrogen ( . - . %) and phosporous ( . - . mg/ g dry matter) with ph . - . (losvik a) . the site has a rather high species diversity (losvik , - species of phanerogams/l m ), compared to other grasslands in south-west norway ( vstedal ; lundekvam & gauslaa ; losvik a losvik , b losvik , losvik , a hundt & vevle ). species such as cynocurus cristatus, briza media, trifolium dubium and linum catharticum, which are rare in western norwegian grasslands at present (losvik b) , are abundant in the site. an assembly of uncommon grassland species of which usually only a few are recorded from each hay meadow site in western norway, such as leucanthemum vulgare. euphrasia stricta. rhinanthus minor, polygala vulgaris, centaurea jacea, pimpinella saxifraga, carum carvi, danthonia decumbens, lotus corniculatus and platanthera chlorantha, are here found together in an area of about ha. the traditional management has lowered the content of n and p in the soil and resulted in high species diversity, as have also been recorded in other studies (e. g. kowalsky ; van der maarel ; silvertown ; zoller & bischof ; collins & barber ; de leeuw & bakker ). at the time of the investigation, the managed site had a scattered tree layer of fraxinus excelsior, alnus incana, taxus baccata and old individuals of planted prunus domesticus. sites b, c and d were abandoned , and . site c was grazed by sheep the whole summer for some years after the mowing ceased. in site d light grazing might have occurred for some years after abandonment, as sheep from neighbouring parts went trough broken fences. site b was lightly fertilized in the ties. it is assumed that the vegetation was much the same over the whole hillside at the time when the management was the same, especially as the management probably had a tradition of a hundred years at least (a. a. gjuvsland, pers. comm.) . ecological factors such as as-pect and bedrock were mainly equal over the whole hillside. site c is generally steeper than the other sites and slopes about ". according to intervjues the tree structure of the sites seem to have been quite similar, with less than % tree cover of scattered trees. the nomenclature follows lid . the size of sites a-d was , . , . and . ha respectively. the whole area of each site was divided into equally-sized subsites, in order to assure that the data set should represent well the vegetation of the site. steep, stony or otherwise inaccessible areas of the sites were excluded. within each subsite the southwestern comer of one square plot of cmz was chosen at random. four subplots of x cm, placed in the comers of this plot were analysed in addition to a x cm plot, a x m plot, a x m plot and a x m plot, each made by extending the former plot towards south-east and north-east. thus the plots within each subsite were nested, while all the ten m plots were distinctly separated, each in its subsite. presence of all rooted phanerogamic species of the field layer were recorded in or . shade was estimated as the cover of the tree layer in each m plot. the number of years since abandonment, in addition to records of former management, was obtained by interviewing each farmer. main ecological gradients in the data set and length of the gradients were assessed by detrended correspondence analysis, dca (hill ; hill & gauch ) applied to presence-absence data for species (species occurring in < plots were omitted) in quadrates of m each, using canoco version . (ter braak a (ter braak , . canonical correspondence analysis, cca (ter braak (ter braak , b ) was used to evaluate the importance of the environmental factors: years since abandonment and cover of trees. standard options were used. monte carlo test (ter braak ) with permutations was used to assess the significance of first canonical axis and overall significance. specieslog area curves, frequencieslog area curves, and the significance of differences in species richness (t-test) were constructed or calculated using excel sistence in the sites and to frequency in the different plot sizes of the frequencylog area curves. dca indicated a gradient of shade along the first axis (eigenvalue . , . sd) and of nutrients along the second axis (eigenvalue . , . sd). the indicators of traditional management and the common species as expected were situated in the light part of the first axis gradient, while the additional species, comprising forest species and species common along edges and roadsides, were mainly in the opposite direction (fig. a) . when the plots were classified according to tree cover, the gradient of shade along the first axis was clearly demonstrated (fig. lb) . grouping of the plots according to sites showed small variations along the gradients in sites a, b and c compared to site d (fig. lc) . the plots within each site were more similar to each other than to any plot outside the site along axis and . cca demonstrated that tree cover (cover) as expected was positively correlated with the first axis (fig. , eigenvalue . ). years since abandonment (abandon) was positively correlated to both the first and the second axis. axis accounted for . % of the species data, axis accounted for only %. significance (p) of the first axis was . , and an overall test also gave significant results. of the species recorded in sites a-d, % were absent in the plots of site d, % were absent in site a (appendix , ) . seven % of the species was only recorded in a, % only in d. the speciesarea curves of site a for the total number of species, indicators of traditional management and common species were nearly linear, but the b-d curves increased exponentially with log area. construction of log species log area curves made the total species number curve of site d close to linear, but the other curves then turned into power function curves. none of the curves showed any tendency to flatten out at the investigated size ranges. the mean total species number for plot sizes > . m and < m was higher in the traditionally managed site a than in the abandoned sites (fig. a , p < . ). sites b and c had intermediate mean species numbers, while site d had a lower number of species than sites a-c in these plot sizes (p < . ). the highest species number in m plots was recorded in site a with species (mean , sd ) and in the m plots in site c with species (mean , sd ), followed by site a with species (mean , sd ). when all species in the plots of each site were added ('plot' size . ha) however, site b had the highest species number (appendix , ). the number of indicators of traditional . site a had the highest and site d the lowest mean number of common species in plot sizes from . m to m (fig. c) . the differences in mean number of additional species between sites were negligible for plot sizes up to m (fig. d) , but for larger plot sizes the mean number was lower in site a than in sites c-d (p < . ) and for the largest plots it was also lower in site a than in site b (p < . ). the species were grouped according to presenceifrequency of each species in all plot sizes in sites a-d (appendix , ). group mostly comprised indicators of traditional management and common species, species which are more or less dependent on high light availability. the frequency of species of group la tended to be much lower in sites b-d for all plot sizes than in site a, if they were present there at all (fig. a) . species of group lb (fig. b) persisted rather well as long as there were openings in the tree canopy. other agricultural landscape species were rather shade tolerant (group lc, fig. c) , and some species even increased in frequency after abandonment (group id, fig. d ). group a comprised border species which benefit from the lack of management in addition to the high light availability in the early successional phases (fig. a) , and group b comprised forest species (fig. sb) . the rest of the species occurred too scattered in the sites to be grouped. the ordination results confirmed that low tree cover was important for the indicators of traditional management and the common species, while the additional species may tolerate well both abandonment and a high tree cover, even if the species may be present also in the traditionally managed hay meadows. with the assumption that the vegetation in sites b-d was about the same as in site a when they were managed, dca indicated that open abandoned areas gradually became poorer in nutrients with time while areas with a closed tree canopy became richer (see figs l b and lc, plots b , d , d - ). austad & skogen ( ) recorded both rather poor betula pendula forest and rich forest types with e. g. ulmus glabra in abandoned meadows in a hillside in sogn. in losvik ( ) it is concluded that successional trends in abandoned hay meadows take different courses according to nutrient content in the soil. mown, soil analysis will be necessary to prove that the differences in vegetation resulted from differences in soil nutrients. a theory would be that litter of trees, here mainly ash (fraxinus excelsior), decomposes more rapidly than the grass sward of the open areas, releasing more nutrients to the soil in the studied time span of abandonment. bearing these statements in mind, an interesting interpretation of the cca result may be that tree cover, according to the distribution of species in the diagram, seemed to be positively correlated to nutrients, while abandonment in general seemed to be negatively correlated to nutrients. in an abandoned area, tree cover, and with it the area which is in shade will increase with time. but the increase is not evenly distributed in space, as some areas are open for a considerable time span, and as such may be poorer in nutrients than areas with a tree canopy. this implies that the composition of the vegetation in an abandoned area is dependent not only on time since abandonment, but also on the local extent of the tree cover. the occurrence of seedlings of ash (fraxinus excelsior) in at least out of plots of . m in sites b -d (fig. b) indicate that the open areas of these sites is really in a transitional phase, during which the content of nutrients in the soil may be lower than in areas with a closed canopy. the rate of increase in species number with increase in plot size is an appropriate measure of richness, instead of using number of species in one plot size (kilburn ; van der maarel a; leps & stursa ) . but this rate ought to be constant along the curve, and as the species-area curves differed too much in shape in this study, only differences in species numbers between sites at the different plot sizes were compared. singh et al. ( ) curves were close to linear, while in sites b-d, with a large turnover of species, the curves were exponential. the form of the curves is dependent on density of individuals and plant unit areas. in the traditionally managed hay meadow the species were mostly small with individuals of the species quite evenly distributed. the much smaller variance in plots of site a compared to site d (fig. lc) demonstrated that in site a the vegetation was more homogenous (van der maarel & sykes ) . wherever the analysis would have started, comparatively many species would have been added by increasing the area (fig. a) . in the abandoned meadows, the plant unit area was probably larger and there was a tendency for a few species to dominate in the plots. both invaders and persistent agricultural landscape species had a contemporary scattered presence in the plots. thus the linear log area curves may indicate homogenous stands, while exponential specieslog area curves may indicate succession or disturbance. continuos increase of the curve at and even above the investigated plot sizes is recorded e. g. by hopkins ( ) and barkman ( ) . both tree cover and management was important in deciding the species richness in the investigated chronosequence. in a wooded meadow in estonia kull & zobel ( ) also found the highest species richness where tree canopy cover was lowest (see also pausas ) , and highest species richness in sites with the most regular long-term mowing as compared to cases of cessation of mowing. for semi-natural grasslands van der maarel ( a) argued that grazing animals and mowing imply disturbance and to some extent stress, which enable more species to coexist than on a similar area without grazing or mowing. the reasons for the high mean species richness in site a compared to the other sites may be complex (see e. g. giller ), but the management itself undoubtedly plays a major role in providing light and gaps for all the species at the right time and place throughout the growing season. in the abandoned sites less light penetrated into the lowest part of the field layer and this resulted in exclusion of low-growing species (kull & zobel ) . the species may either disappear after a gradual reduction in population size, or there may be a collapse in the occurrence of the species. some of the species in this study (appendix , group la) may have experienced such rapid (in less than years) disappearance. the species diversity decline in sites b-d in - years was % - % in plots of . m - m (table ) . however, when the whole investigated area in each site was considered, the total number of species increased in all the abandoned sites (appendix , ) . this clearly demonstrated that use of total species number is very dependent on plot size and that it may be rather useless in choosing hay meadows for conservation. high species turn-over rates during the first - years of a sere was reported e. g. by houssard et al. ( ) . much more rapid decline in species diversity, % in - years, was reported from chalk grassland by willems ( ) . in time sites b-d will turn into deciduous woodland with a closed canopy and the indicators of traditional management and common grassland species will even- . ) ( . ) o( . ) ( . ) tually be lost. the fact that the mean total number of species in plots of site a was higher for areas > . m and < m than in the other sites, and at the same time total species number of the whole investigated area in site a was smaller than in the other sites, showed that in site a most species were represented by a large number of individuals which were quite evenly distributed over the whole area of the site. this can easily be seen from appendix , as it appears that in site a species are present in all plots of m each, while in sites b-d the numbers are , and , respectively. on the other hand, the number of indicators of traditional management was highest in site a, both when total number in investigated plots and when mean number at all investigated plot sizes were considered (appendix and , fig. b ). the usefulness of the group indicators of traditional management as functional indicators in choosing hay meadows in the region for conservation is thus demonstrated (see also mcintyre & lavorel ). common grassland species are quite frequent in the studied abandoned sites. they thrive when biomass is no longer removed, resulting in increased nitrogen levels in the soil. the additional species occur very scattered in the whole chronosequence, and are thus generally infrequent in small plots. in larger plots the difference, mainly between site a and site d become visible. in site d more forest species have established than in the other sites. as site c had been abandoned for a longer time than site b, more forest species had established there, and at the same time more light demanding species persisted (appendix , ) probably a result from the period with grazing after cessation of mowing, and thus species richness was higher there than in site b. similar species overlap, occurring when competitive exclusion has not yet had time to drive subordinate species to extinction is advocated also by palmer ( ) . higher species richness in grazed areas as compared to abandoned grassland was reported by regncll ( ) . the groups of species used in the present study differs from guilds sensu pianka ( ) in that no species can be a member of different groups, moreover it is no rea-son to believe that species within one group interact more intensively, mainly by competition, than species from different groups (van der maarel b). the species of group differed in their ability to persist in the abandoned hay meadows. the groups la and lb comprised some of the most rare hay meadow species in western norway at present. in abandoned hay meadows they are often recorded close to or at refuges like outcrops or heaps of stones on dry or shallow ground in the open, where the field layer is low and scattered. some of them are annuals which are dependent on gaps in the sward for germination of their seeds. for example pimpinella saxifraga is known to be very sensitive to increased density of the sward (grubb ) . these species will presumably not be able to survive the closing up of the canopy where they grow at present, and thus they face extinction at their sites in the near future. at sites in western norway where these species have been recorded during the last years, they probably will experience a collapse extinction as indicated by the differences in the frequencyarea curves. hay meadows which have been abandoned less than years ago are getting increasingly rarer, and so are the rare species within these groups. other species in these groups are as vulnerable to abandonment as the rare species, but are generally more common because they tolerate the low or medium quantities of fertilizers used in the westem norwegian small scale and part time farming system. thus a smaller group of species, tolerating neither fertilizing nor abandonment may be delimited and used as indicators in choosing hay meadows for conservation (appendix , : *). these species occur in less than % of analysed hay meadow plots in western norway published (losvik b , while indicators of traditional management occurred in less than % of the plots. several rare hay meadow species, not recorded in the plots of this study, such as botrychiurn lunaria, dianthus deltoides, galiurn verurn. gymnadenia conopsea and lychnis viscaria, will have to be added to complete this list of indicators of rare hay meadows. generally the species which gradually decreased in frequency with time since abandonment or increased in early successional stages (groups ic and id) were plants of some height and thus good light competitors in this situation. the groups comprised indicators of traditional management which are more common than species of groups la and lb, as a result from their tolerance of shade and/or light fertilization. unusual long stalks, bringing them up towards the top of the field layer were observed in individuals of potentilla erecta and lotus corniculatus in the studied abandoned sites. many border and forest species do in fact occur even in the traditionally managed hay meadow site. these species are usually growing close to stone heaps and below trees. they constitute a species pool and are able to expand as soon as the hay meadow is abandoned. the existence of a forest and border line species pool in managed hay meadows must be an old feature, as the whole of these landscapes were used very intensively under the old traditional management regimes, and thus the present forest species must have had refuges somewhere in the managed landscape. the investigation shows that in order to preserve the hay meadow diversity, the continuity in management is crucial. even with short periods of abandonment there is a risk of loosing species. the longer the period of abandonment the more species become extinct. with continuously smaller populations and eventually increasing distance between the fragmented area o f the populations, more and more species run the risk of becoming extinct in the site. therefore it is very important to try to retain the management in areas with traditional landscape diversity and typical population structure. frequency of species which were present in < out of plots of m*each in the investigated sites. t: indicator of traditional management, * indicator of rare hay meadows. c: common grassland species, f: forest species. a: additional species, abb.: abbreviations of species names. mixed grazing with sheep and cattle compared with single grazing husdyrbruket i hordaland gjennom hundrad ir. -hordaland landbruksselskap effects of disturbance and diversity in mixed-grass prairie sheep-grazing with different foraging efficiencies in a dutch mixed grassland diversity models applied to a chalk grassland nedbarsnormaler, normalperiode - . -rapport dnmi / . klima kisdistriktet lve-varaldsey og bergverkdriftens historie community structure and the niche vegetation classification by reference to strategies demographic studies on the perennials of chalk grassland decorana-a fortran program for detrended correspondence analysis and resiprocal averaging. -ecology and systematics departement detrendent correspondence analysis: an improved ordination technique hordalands berggrunnsgeologi temperaturnormaler, normalperiode - . -rapport dnmi / . klima restoration of a deciduous woodland in western norway formerly used for fodder production: effects on tree canopy and field layer a critical evaluation of minimum area concepts plant species diversity in old-field successional ecosystems in southern illinois calcareous grassland -a limited resource in britain analysis of the species-area relation hardanger and sogn ecology of mountain grasslands and use by elk high species richness in an estonian wooded meadow spatial distribution of species diversity indices and their correlation with species-area curve, life history strategies, and succession: a field test of relationships university of bergen, bergen. - b. phytosociology and ecology of old hay meadows in hordaland, western norway in relation to management phytosociology and ecology of mown grasslands in western norway how environmental and disturbance factors influence species composititon in temperate australian grasslands microsoft corporation . microsoft excel. brukerhhdbok . -microsoft corporation plant species diversity in old-field succession in the giorgia piedmont variation in species richness: towards a unification of hypotheses species richness patterns in the understorey of pyrenean pinus sylvestris forest forest vegetation of the colorado front range: patterns of species diversity prediction of man's impact on plant species diversity vegetation development after the exclusion of grazing cattle in a meadow area in the south of sweden evolutionary ecology succession of vegetation in abandoned fields in finland the life forms of plants and statistical plant geography a numerical study of successions in an abandoned, damp calcareous meadow in s sweden apen eng og beitemark i drift, voss kommune, hordaland -vegetasjon og skologi. -cand. scient. thesis coexistence of plant species with similar niches the dynamics of a grassland ecosystem: botanical equilibrium in the park grass experiment plant species richness and species-area relations in a shortgrass steppe in colorado floristic richness, diversity, dominance and species evenness in old-field successional ecosystems geography of norwegian agriculture. -publ. norw.univ. school of econ. & business adm. geographical series a canoco -a fortran program for canonical community ordination by (partial) (detrended) (canonical) correspondence analysis, principal components analysis and redundancy analysis (version . i). -agricultural mathematics group, wageningen. - b. the analysis of vegetation-environment relationships by canonical correspondence analysis. -vegetatio - . update notes: canoco version . . -agricultural mathematics group, wageningen. van der maarel, e. . plant species diversity in relation to management small-scale plant species turnover in a limestone grassland: the carousel model and some comments on the niche consept composition and species diversity of pine-wiregrass savannas of the green swamp, north carolina observations on north-west europaean limestone grassland communities. . phytosociological and ecological notes on chalk grasslands of southern england changes in chalk-grassland structure and species richness resulting from selective nutrient additions stufen der kulturintensitat und ihr einfluss auf artenzahl und artengegge der vegetation the vegetation of lindas and austrheim, western norway acknowledgements -the project was supported by . kishorekumar for the assistance with the input of data, and to three unknown referees for comments on the manuscript. key: cord- -ol du ph authors: kliger, yossef; gofer, eyal; wool, assaf; toporik, amir; apatoff, avihay; olshansky, moshe title: predicting proteolytic sites in extracellular proteins: only halfway there date: - - journal: bioinformatics doi: . /bioinformatics/btn sha: doc_id: cord_uid: ol du ph motivation: many secretory proteins are synthesized as inactive precursors that must undergo post-translational proteolysis in order to mature and become active. in the current study, we address the challenge of sequence-based discovery of proteolytic sites in secreted proteins using machine learning. results: the results revealed that only half of the extracellular proteolytic sites are currently annotated, leaving over unannotated ones. furthermore, we have found that only % of the unannotated sites are similar to known proteolytic sites, whereas the remaining % do not share significant similarity with any annotated proteolytic site. the computational challenges in these two cases are very different. while the precision in detecting the former group is close to perfect, only a mere % of the latter group were detected with a precision of %. the applicability of the classifier is demonstrated through members of the fgf family, in which we verified the conservation of physiologically-relevant proteolytic sites in homologous proteins. contact: kliger@compugen.co.il; yossef.kliger@gmail.com supplementary information: supplementary data are available at bioinformatics online. many secretory proteins and peptides are initially synthesized as larger precursors, usually in the form of pre-pro-proteins. such precursor proteins undergo post-translational proteolysis: the n-terminal pre-region, known as signal peptide, is cleaved by a well-characterized signal peptidase [reviewed in (paetzel et al., ) ], while various proteases liberate the active proteins from the pro-proteins. the following examples demonstrate the importance of the latter process and its regulation: (i) the envelope (env) glycoprotein of hiv- is synthesized as a precursor polypeptide. in the trans-golgi network, env is cleaved by the cellular protease furin into two functional subunits. cleavage of env occurs at a conserved sequence. mutagenesis of this sequence produces non-infectious hiv- particles containing unprocessed env (earl et al., ; kowalski et al., ; mccune et al., ) . this finding establishes the importance of furin-mediated processing for virus-infectivity. accordingly, inhibitors of the host protease furin impede hiv- replication by interfering with the proteolytic processing of env, suggesting they are useful for combating hiv- (bahbouhi et al., ; hallenberger et al., ; kibler et al., ) . furthermore, inhibiting the production of peptides involved in various diseases by blocking the activity of the proteolytic enzymes is a promising approach (basak, ; bergeron et al., ; de haan et al., ) . (ii) the release of peptide hormones is subject to a complex and finely tuned regulation system. post-translational proteolysis plays a key role by specifically converting the pro-hormone precursor into biologically active products. examples of peptide hormones, whose proteolytic processing regulates their activities, are: insulin, somatostatin, parathyroid hormone, glucagon and glp- . many of these are used as therapeutic peptides for treating various disorders. the importance of identifying mature proteins fuels both experimental and computational approaches aimed at discovering and predicting proteolytic sites. experimental attempts to unveil the human plasma proteome using proteomics methods fail to detect most cytokines and protein hormones, presumably due to their low abundance [summarized in (anderson et al., ) ]. currently, most computational approaches are protease-oriented and rely on proteolytic site data of specific enzymes (blom et al., ; cai et al., ; kiemer et al., ; yang and berry, ) . however, while proteolytic sites in a protein can be experimentally identified, for example, by n-terminal sequencing of the processed protein fragments, it is much harder to find out the catalyzing protease involved. hence, only a limited number of experimentally verified proteolytic sites can be associated with a specific proteolytic enzyme, and therefore the data available as training sets for these methods is relatively limited. many of the proteolytic sites whose catalyzing enzymes are known are processed by members of one family of serine proteases, called pro-hormone convertases (pcs) (seidah et al., ) . all known proteolytic sites of mammalian pcs have an arginine or a lysine at the first position n-terminal to the proteolytic sites. furthermore, no other enzyme that catalyzes the processing of proteins in the secretory pathway is known to cleave immediately after these basic amino acid residues. it is therefore reasonable to assume that proteolysis after a basic residue is catalyzed by a member of the pc family. this allows data extraction of sequences of proteins, which are processed by a pc member, from databases of precursor proteins and proteolytic sites. such extracted data, together with the evolutionary relatedness between the members of the pc family, suggests that it might be possible to construct a classifier that will discriminate between pc proteolytic sites, regardless of the specific pc member, and other sites. such an approach was taken by blom and colleagues (duckert et al., ) , who extracted pc proteolytic sites based on swiss-prot (version ) annotation. herein, we describe an improved data extraction process, which considered more proteolytic sites. the extracted data was used for training classifiers, which are based on two different classification algorithms-random forest and support vector machines. the best classifier was used to provide a comprehensive list of predicted proteolytic sites in the mammalian secretome. several interesting predictions of proteolytic sites are discussed. all eukaryotic proteins were downloaded from the swiss-prot knowledgebase version . (boeckmann et al., ) . proteins whose first residue is not methionine were discarded, as they might not contain the full-length sequence of the precursor protein. the same holds for swiss-prot entries that include the phrase 'protein sequence', but do not include 'nucleotide sequence' in their rp annotation lines, as these entries might contain sequences of processed proteins, rather than the full-length precursor proteins. data of proteolytic sites were extracted from the post-translational modifications annotation lines (ft) of the swiss-prot knowledgebase (farriol-mathis et al., ) . two types of classifiers were tested: random forest (rf) (breiman, ) and support vector machines (svm) (vapnik and cortes, ) . for the svm classifier, we used joachims' svmlight package (joachims, ) . predicting whether a protein has an n-terminal signal sequence, was performed using the signalp . prediction tool (bendtsen et al., ) . multiple sequence alignments were computed with probcons (do et al., ) and were edited using jalview (clamp et al., ) . since the aim of the classifier was to model proteolytic processes taking place in the secretory pathway, only secreted proteins and extracellular parts of membranal proteins (secretome) were considered. thus, only proteins annotated as containing a signal peptide or a transmembrane domain in the feature table (ft) lines of the swiss-prot annotation record, or annotated as being secreted or extracellular in the comment (cc) lines of the swiss-prot annotation record were selected. in the case of integral membrane proteins, cytoplasmic domains were not considered. the membrane topology information, i.e. the location of the membrane-spanning regions and their orientation, was extracted from the topology annotation lines of the swiss-prot entry (ft topo_dom and ft transmem). when these lines do not span the full length of the protein, we completed the full topology of the protein according to the annotated signal peptide, transmembrane domains, extracellular domains and cytoplasmic domains. this process was performed twice: once by starting from the most n-terminal topology annotation, and once by starting from the most c-terminal topology annotation. whenever discrepancies between the two completion processes were found, the swiss-prot entry was discarded. such discrepancies point to mistakes in the topology annotation of multi-span proteins. ideally, the extracted proteolytic sites should be divided into sites that are catalyzed by enzymes working in the secretory pathway, the extracellular matrix, the cytoplasm, the digestive system or in extracellular fluids. when available, annotation of the identity of the proteolytic enzyme was extracted from the ft annotation lines (following the phrase 'removed by' in the description of propeps lines, or following 'by' in the description of 'site. . .cleavage' lines). as the aim of this study is to model the processes that take place in the secretory pathways, proteolysis processed by enzymes that are known to act outside the secretory pathway were discarded. the list of enzymes known to act outside the secretory pathway that appear in the annotation of swiss-prot entries of the proteins they cleave includes: adam , aggrecanase, alpha-secretase, beta-secretase, caspase- , cathepsin g, arginine-specific endoprotease, c convertase, chymosin, collagenase, dipeptidase, dipeptidylpeptidase, dpp , easter, elastase, kallikrein and kallikrein-like serine protease, mmps ( , , and ), coagulation factors (i, viia, ixa, xa and xia), plasmin, procollagen c-endopeptidase, procollagen n-endopeptidase, rennin, thrombin, trypsin and u-pa. blom and colleagues (duckert et al., ) extracted pc proteolytic sites based on swiss-prot annotation. they screened for precursor proteins that are annotated to have a signal peptide, followed by a propep that ends with an arginine or a lysine, and then followed by a peptide or a chain. they were then able to construct an artificial neural network-based classifier for predicting proteolytic sites catalyzed by members of the pro-hormone convertase family of proteases (duckert et al., ) . however, this procedure is too strict for part of the proteolytic sites. for example, human insulin (swiss-prot id: ins_human) is composed of a signal peptide, followed by a peptide, a propep and then another peptide. these two well-characterized proteolytic sites were ignored by the conservative extraction, because insulin has no propep immediately after the signal peptide. therefore, due to the scarcity of data, we used a less strict data extraction procedure as described below. this study focuses on proteolytic sites of enzymes that cut immediately after lysines or arginines. such enzymes are often classified as members of the pro-hormone convertase family. therefore, only sites with a lysine or arginine at the first position n-terminal to the proteolytic site were considered. we extracted all -mers of the secretome, arranged symmetrically around a potential proteolytic site after a basic residue, and designated them as follows: (i) experimentally-validated proteolytic sites, which are annotated by a swiss-prot ft annotation line according to the word template 'site. . .cleavage', were marked validated. (ii) experimentally-validated proteolytic sites, whose existence is indicated by the annotation of the two protein segments right before and immediately after the proteolytic site, were also marked validated. the annotation for protein segments is in the form of swiss-prot ft annotation lines having the word template 'peptide (or propeptide or chain) [first residue] [last residue]', and the two segments of the protein should be consecutive, i.e. the first residue of the second segment immediately follows the last residue of the first segment. we do allow for a short linker section in between the two segments, provided that it is likely to be removed by exopeptidase e after the processing of the protein precursor by a pro-hormone convertase (day et al., ; friis-hansen et al., ) . we consider linker sections consisting of k, r, kk, kr, rk, rr, or successive ks and/or rs followed by a classical furin proteolytic site (rxkr or rxrr, where x is any natural amino acid) as likely to be cut by exopeptidase e. we also allow for a glycine to immediately upstream of the basic residue/s at the c-terminus of the first peptide, propeptide or chain, as it is likely that these peptides are substrates for c-terminal alpha-amidating enzymes that convert the peptides to the corresponding desglycine peptide amide, where glycine is the amide donor (bradbury et al., ) . the ambiguous sites (after each of the residues located in-between the two annotation lines) are marked ambg. (iii) when only one peptide, propeptide or chain annotation line suggests the existence of a proteolytic site, our confidence in the proteolysis site is reduced and the site is marked potential. (iv) when comments like 'probable', 'by similarity' or 'potential' (farriol-mathis et al., ; junker et al., ) appear in the description of the ft lines in the cases described in (i) and (ii), the proteolytic site is designated as potential. (v) when the distance between two proteolytic sites does not exceed four residues, the reliability of both sites is reduced. such proteolytic sites are marked potential unless there is strong support for their reliability. strong support for one or both of the two proteolytic sites is considered if a proteolytic site is marked validated according to the criterion in (i). strong support for one or both of the two proteolytic sites is also considered if a proteolytic site is marked potential according to the site. . .cleavage annotation line, and also marked validated according to the criterion in (ii). (vi) all other positions were marked non (table si) . ideally, data would be separated into distinct training, test and validation sets. however, the relative scarcity of cleavage sites, and their different levels of reliability, present a challenge when preparing datasets for classification, and necessitate a different approach. a validation set consisting of a random quarter of the data was held out and used for parameter optimization. the rest of the data were used, once optimal parameters were chosen, in cross-validation to evaluate performance. when training, only the most reliable proteolytic sites, namely, sites that were marked validated, were used as positive examples, while a subset of the sites marked non was used as negative examples. for the purpose of performance evaluation, on the other hand, it is important to use a set representative of all data. thus, in the parts of the data used for testing, proteolytic sites that were marked validated or potential were labeled positive, while those marked non or ambg were labeled negative. homologous sequences raise special difficulties due to the relationship between redundancy and information. it is therefore essential to handle them with care. one approach is to discard some of the protein sequences, in a way that maximizes coverage and minimizes redundancy (hobohm et al., ) . the weakness of this approach is that it prevents learning from the subtle changes that exist between very similar sequences. for this reason, and due to the scarcity of annotated data, others and we decided to use all available data. this approach requires special precautions in order to minimize the risk of overestimating the predictive performance owing to training set and test set similarities. one way to avoid training and testing on homologous data is to divide the data into several partitions based on a phylogenetic tree, and then calculate the performance by cross-validation (duckert et al., ) . we used a different approach, which is described in what follows. we argue that the task of classifying a site is naturally divided into two cases, depending on whether or not this site is similar (to a degree, homologous) to a known proteolytic site, i.e. a proteolytic site present in the training set. classifying 'seen before' sites and 'new' sites are tasks that are different in nature, and have a different level of difficulty. this implies the need for two methods of classification, and, more important, for separate performance evaluation for the two tasks. in order to discriminate between the classification tasks, we analyzed -mers, arranged symmetrically around a potential proteolytic site, which were marked as validated or potential. each -mer was compared to its most similar known proteolytic site, and the number of identical residues was counted. our analysis confirmed that -mer sites that share more than nine residues with a known proteolytic site are most likely to be proteolytic sites themselves ( figure s ). we chose this threshold for dividing the data into 'new' and 'seen before' sites. the number of identical residues to the closest known proteolytic site was also used as an additional input feature for the classifier. this feature improves the classification results ( figure s ). figure reveals that, as expected, the tasks of classifying 'seen before' sites and classifying 'new' sites, are different in nature, and confirms the need for two separate performance evaluations. in addition, a classifier trained to identify 'new' sites was more successful at identifying 'new' sites than a classifier trained to identify 'seen before' sites ( figure b ). . . parameter tuning a quarter of the data was picked out at random to serve only for tuning parameters, while the rest was used at the tuning stage for training. the held out set was divided into 'seen before' and 'new' sites, based on the maximal similarity to known sites in the training set. the two classifiers, for 'seen before' sites and for 'new' sites, were then, separately, optimized by evaluating precision vs. recall graphs based on the raw score output of random forest (rf). the inputs to the classifier were (i) a symmetrical window around the site, and (ii) the maximal identity to a known cleavage site, divided by the window size. for the classifier specialized in 'seen before' sites, we used a symmetrical window of residues surrounding each site, a negative set times larger than the positive set, and the internal weighting mechanism of rf was set to give a weight of to the positive set, and to the negative set. mtry was set to , and trees were found to be sufficient. for the classifier aimed at identifying 'new' proteolytic sites, we used a symmetrical window of residues around each site, a negative set times larger than the positive set, and the internal weighting was set to for the positive set and for the negative set. mtry was set to and trees were again found to be sufficient. for the svm classifier, we tried different polynomial kernels. the best degrees were found to be and for the 'seen before' and 'new' classifiers, respectively. the vectors fed to the svms were in sparse representation (qian and sejnowski, ) . the maximal identity value was used with the svm the same way as with the rf classifier. the data that was not used as testing data in the parameter optimization step (three quarters of the data) was used for -fold stratified cross-validation. specifically, at each step of the cross-validation, nine-tenths of the data were used for training. the remaining tenth was used for testing after being divided into 'seen before' and 'new' sets with respect to the current training set. by 'stratified' we mean that each tenth part of the data contained the same proportion of validated, potential, etc. sites. the parameters used were those found to be optimal in the parameter tuning step. as explained above, all the data that was not used for parameter tuning was used for testing, in order to reflect the heterogeneity of the data as much as possible. however, there is uncertainty as to the label of any data that is not validated. to a large degree, we trust sites designated potential to be real proteolytic sites. manual reviewing of many of the potential sites suggests that this assumption is reasonable. we assume that most ambg and non sites are not proteolytic sites. still, it is expected that yet undiscovered proteolytic sites are hidden among the sites marked non or ambg. the sheer volume of non sites raises the suspicion that there are even more unknown proteolytic sites labeled non than known proteolytic sites. this may distort performance evaluation statistics. we present below a calculation that attempts to tackle this problem. where tp i denotes instances in the positive set, correctly classified as positive, tp o represents mislabeled instances in the negative set, correctly classified as positive, t i denotes instances in the positive set, t o represents mislabeled instances in the negative set, and p o denotes instances in the negative set, classified as positive. it is now easy to note that calculated precision evaluations are always underestimated. the reason is that while the denominator in equation ( ) is the same as in equation ( ), the numerator does not include tp o , which may be even larger than tp i . we now proceed under the assumption that negative data is a mixture of two statistical types of data-mislabeled positives fig. . the effect of creating two specialized classifiers. it is clear that the performance of classifiers for 'seen before' and 'new' sites should be evaluated separately. furthermore, the figure shows that it is worth training specialized classifiers: (a) identification of 'seen before' sites. the classifier trained to identify 'seen before' sites is somewhat better at identifying such sites than the classifier trained to identify 'new' sites. (b) identification of 'new' sites. the classifier trained to identify 'new' sites performs better than the classifier trained to identify 'seen before' sites at identifying 'new' sites. (a fraction of the negative data) and real negatives. mislabeled positives are assumed to have the same statistical nature as positive data. let f i (f o ) be the cumulative distribution function of the score for positive (negative) data. let n i (n o ) be the number of positive (negative) instances. let t be a threshold for the score. note that the real recall is independent of , and is therefore equal to the ordinary recall calculated without assuming any mislabeling. the real precision is the ordinary precision multiplied by a correction factor: ( þ n o /n i ). therefore, for ¼ we recover the ordinary precision. to summarize, mislabeling leaves the recall unchanged, while the precision is enhanced by a factor for furin proteolysis, we can obtain a reasonable estimate of this factor, because furin sites have an easily detectable consensus (nakayama, ) . we extrapolate from furin to proteolytic sites of other members of the pro-hormone convertase family, in an attempt to reflect the curation level of proteolysis annotation in the swiss-prot knowledgebase. we look for the furin proteolysis consensus site, after rxkr or after rxrr, in the positive and negative sets. the instances in the positive set are real positives, whereas the ones in the negative set are a mixture of proteolytic and non-proteolytic sites. there is evidence that a lysine located two positions after the putative proteolytic site prevents cleavage, so such instances were excluded. in addition, we observed which residues are most frequent immediately after the proteolytic site in the positive set. our method for finding the ratio t o /t i was to look for the same subfamily of sites in both positive and negative sets: instances of a furin consensus followed by one of the most frequent residues (as found in the positive set), excluding lysine in the second post-cleavage position. the calculated furin correction factor was found to be . for the 'seen before' classifier, and . for the 'new' classifier. note that because of the inaccuracy of this correction procedure, corrected precision values may exceed . it must be emphasized that the furin correction factor is based on the assumptions that the ratio of annotated proteolytic sites to unannotated sites is equal for furin and other pc sites, and that classifier score distributions are mixtures as described above. both these assumptions are very rough approximations. still, we believe this correction gives a better evaluation of classifier performance. a comparison between the performance of rf and svm classifiers specialized in 'new' sites is shown in figure . the rf classifier performs better in the high precision/low recall area, while svm performs better in the high recall/low precision area. figure also shows the effect of the furin correction factor on the raw score output of the rf and svm classifiers. the performance of both the rf and svm 'seen before' classifiers is almost perfect (figure s ), as expected, and becomes perfect when applying correction (data not shown). the classification procedure described above was repeated, but this time, no holdout set was removed, and -fold stratified cross-validation was applied to the whole eukaryotic secretome. for each classifier, scores were replaced by their corresponding precision values. each site was given a single score: a 'seen before' site was given its score according to the 'seen before' classifier, and a 'new' site was given its score according to the 'new' classifier. fig. . comparison between rf and svm classifiers specialized in 'new' sites, and the effect of the furin correction factor. validated and potential data are treated as positive for testing, the rest as negative. the furin correction is a way to compensate for the fact that some of the data we treated as negative for cleavage is actually mislabeled (unknown proteolytic sites). (a) raw score output of the rf and svm classifiers; (b) precision is multiplied by . , which is the calculated furin correction factor. it should be remarked that because of the imperfection of the correction procedure, corrected precision values may exceed . precision values that exceed are set to . for 'new' sites, there are validated and potential sites, and non and ambg sites, and the furin correction factor is . . for 'seen before' sites, there are validated and potential sites, and non and ambg sites, and the furin correction factor is . . based on our data extraction, performance evaluation, and the furin correction factor, we estimate that the eukaryotic secretome is comprised of about proteolytic sites, of which ( * . ) are 'seen before', i.e. quite similar to known proteolytic sites, and ( * . ) are 'new', i.e. do not share significant sequence similarity to any annotated proteolytic site. the furin correction factor also allows us to estimate the fraction of unannotated proteolysis for 'seen before' and 'new' sites. our results reveal that only . % ( . / . ) of 'seen before' sites are still unannotated, while % ( . / . ) of 'new' sites are yet to be discovered. furthermore, the rf classifier specialized in 'seen before' sites predicts apparently all 'seen before' sites with a precision greater than %, while the rf classifier specialized in 'new' sites predicts about % of the unknown 'new' sites with a precision of %, and % with a precision of % (fig. ) . swiss-prot . does not include annotation for proteolytic sites in any of the members of the fibroblast growth factor (fgf) family. yet, our prediction method suggests several proteolytic sites in some of the proteins in this family, resulting in a classification of the fgf proteins into three groups of orthologs: fgfs that have conserved n-terminal proteolytic sites, fgfs that have conserved c-terminal proteolytic sites and all others (table sii) . a literature search confirmed some of our predictions. functional proteolytic sites are expected to be conserved among close species. our classifier revealed that the proteolytic site in fgf is indeed conserved in all available fgf orthologs (fig. ) . the c-terminal proteolytic site of fgf is important for normal activity of the protein. several groups reported proteolysis in fgf between arg and ser , and mutations in proximity to this site (r w, r q and r q) were identified in patients with autosomal-dominant hypophosphatemic rickets (adhr) (bowe et al., ; shimada et al., ; white et al., white et al., , . the authors suggested that the proteolysis causes protein inactivation, and that these mutations created a polypeptide less sensitive to proteolysis, thus leading to elevated concentrations of fgf , and to phosphate wasting in adhr patients. our prediction method revealed that these mutated forms of fgf do not undergo c-terminal proteolysis (fig. ) . furthermore, our predictions of proteolytic sites in the c-terminus of the other fgf family members might also imply their deactivation by proteolysis processing. another known case is the n-terminal proteolytic of fgf . the amino-terminal region downstream of the signal peptide of the protein is involved in its retention in the golgi apparatus and the regulation of its secretion (kiefer et al., ) . we predicted proteolytic sites in the n-terminus of human, mouse, zebrafish, chicken and xenopus fgf . indeed, in xenopus, proteolysis between arg and asp is essential for fgf 's biological activity (antoine et al., ) . we suggest that proteolysis of - n-terminal amino acids occurs during the maturation of other fgfs, and may be important for their biological activity. the multiple sequence alignment in figure confirms that the n-terminal proteolytic site is conserved between some fgf family members and in proximity to an upstream variable region. it is worth noting that the proteolytic site is conserved even among remote homologs. some of these homologs possess an n-terminal signal peptide and are secreted via the classical secretory pathway, while others do not possess a signal peptide and are secreted via an alternative pathway (nickel, ) . this study revealed a big potential for proteolytic site predictors, because most proteolytic sites are currently still unannotated. furthermore, the furin correction factor gives an estimate of the total number of proteolytic sites. we estimate the eukaryotic secretome to comprise about ( Â . þ Â . ) proteolytic sites, which means that about . % of r/k in the secretome are proteolytic sites ( /( þ þ þ ) ¼ . ). an important fig. . proteolytic site predictions for fgf of human, three mutant forms from adhr patients, and three vertebrate orthologs. sequences of fgf of human, mouse, rat and pufferfish were aligned together with r w, r q and r q human fgf mutants (mutations are highlighted in dark grey). high score cleavage predictions were assigned to the true cleavage sites (highlighted in light grey). in normal fgf , cleavage is known to take place between the two amino acids in light grey. fig. . fgf and other fgf family members that undergo proteolysis in their n-terminal region. proteolysis of the n-terminal region of fgf is important for regulating its activity. fgf to were also assigned high score n-terminal cleavage site predictions, although they do not have a leading signal peptide. removing the signal peptides of fgf members allows alignment of the n-terminal proteolytic sites. the high conservation of the proteolytic site signatures in contrast to the variability of the flanking sequences, confirms the importance of the proteolytic processing that as in fgf may be involved in the regulation of protein activity. conclusion is that currently only about half of the proteolytic sites are annotated [( þ )/ ¼ . ], meaning there is a great value for predictors of proteolytic sites. another important issue raised in this article is performance evaluation when some of the data is mislabeled. this mislabeling is a result of missing annotation in our case, and these sites are often unknown proteolytic sites. we showed that such mislabeling leaves the recall unchanged, while the precision is reduced by a factor that can be estimated. furthermore, by relying on a well-characterized subgroup, namely furin sites, we were able to estimate the degree of mislabeling. as mislabeling is very common in perhaps most current biological data, we believe that our calculation is relevant for performance evaluation in other biological classification problems. many sites are currently not annotated as proteolytic sites, but are predicted by our classifier with high precision. these include sites in currently developed therapeutic proteins, and in a few cases, the exact boundaries of peptides identified experimentally as minimal sequences required for functionality. we demonstrate the prediction capability of the novel classifier in an analysis of members of the fibroblast growth factor (fgf) family. we were able to discriminate real proteolysis sites from non-cleaving sites of mutant fgf proteins of adhr patients. additionally the predictor was able to identify cleavage sites in remote homologs, suggesting a regulatory role for the predicted cleavages by annotation transfer. in summary, proteolysis has a great influence on the biological function of proteins, and therefore the accurate prediction of proteolytic sites is important for basic research and biotechnological applications. it allows identification of biologically active peptides from non-active precursors. in addition, it allows identification of mutations and polymorphisms that influence the generation of active peptides and proteins. the human plasma proteome: a nonredundant list developed by combination of four separate sources nh -terminal cleavage of xenopus fibroblast growth factor is necessary for optimal biological activity and receptor binding effects of l-and d-rekr amino acid-containing peptides on hiv and siv envelope glycoprotein precursor maturation and hiv and siv replication inhibitors of proprotein convertases improved prediction of signal peptides: signalp . implication of proprotein convertases in the processing and spread of severe acute respiratory syndrome coronavirus cleavage site analysis in picornaviral polyproteins: discovering cellular targets by neural networks the swiss-prot protein knowledgebase and its supplement trembl in fgf- inhibits renal tubular phosphate transport and is a phex substrate mechanism of c-terminal amide formation by pituitary enzymes random forests artificial neural network method for predicting hiv protease cleavage sites in protein the jalview java alignment editor prodynorphin processing by proprotein convertase . cleavage at single basic residues and enhanced processing in the presence of carboxypeptidase activity cleavage inhibition of the murine coronavirus spike protein by a furin-like enzyme affects cell-cell but not virus-cell fusion probcons: probabilistic consistency-based multiple sequence alignment prediction of proprotein convertase cleavage sites biological and immunological properties of human immunodeficiency virus type envelope glycoprotein: analysis of proteins with truncations and deletions expressed by recombinant vaccinia viruses annotation of post-translational modifications in the swiss-prot knowledge base attenuated processing of proglucagon and glucagonlike peptide- in carboxypeptidase e-deficient mice inhibition of furin-mediated cleavage activation of hiv- glycoprotein gp selection of representative protein data sets making large-scale support vector machine learning practical representation of functional information in the swiss-prot data bank polyarginine inhibits gp processing by furin and suppresses productive human immunodeficiency virus type infection retention of fibroblast growth factor in the golgi complex may regulate its export from cells coronavirus clpro proteinase cleavage sites: possible relevance to sars virus pathology functional regions of the envelope glycoprotein of human immunodeficiency virus type endoproteolytic cleavage of gp is required for the activation of human immunodeficiency virus furin: a mammalian subtilisin/kex p-like endoprotease involved in processing of a wide variety of precursor proteins the mystery of nonclassical protein secretion. a current view on cargo proteins and potential export routes signal peptidases predicting the secondary structure of globular proteins using neural network models precursor convertases: an evolutionary ancient, cell-specific, combinatorial mechanism yielding diverse bioactive peptides and proteins mutant fgf- responsible for autosomal dominant hypophosphatemic rickets is resistant to proteolytic cleavage and causes hypophosphatemia in vivo support vector networks autosomal-dominant hypophosphatemic rickets (adhr) mutations stabilize fgf- autosomal dominant hypophosphataemic rickets is associated with mutations in fgf reduced bio-basis function neural networks for protease cleavage site prediction conflict of interest: none declared. key: cord- - ugdxbmy authors: laskar, rezwanuzzaman; ali, safdar title: mutational analysis and assessment of its impact on proteins of sars-cov- genomes from india date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: ugdxbmy the ongoing global pandemic of sars-cov- implies a corresponding accumulation of mutations. herein the mutational status of genomes from india along with their impact on proteins was ascertained. after excluding gaps and ambiguous sequences, a total of variable sites ( parsimony informative and singleton) were observed. the most prevalent reference nucleotide was c ( ) and substituted one was t ( ). nsp had the highest incidence of sites followed by s protein ( sites), nsp b ( sites) and orf a ( sites). the average number of mutations per sample for males and females was . and . respectively suggesting a higher contribution of mutations from females. non-uniform geographical distribution of mutations implied by odisha ( samples, mutations) and tamil nadu ( samples, mutations) suggests that sequences in some regions are mutating faster than others. there were mutations ( ‘neutral’ and ‘disease’) affecting amino acid sequence. nsp has a maximum of ‘disease’ variants followed by s protein and orf a with each. further, constitution of ‘disease’ mutations in genomes from asymptomatic people was mere % but those from deceased patients was over three folds higher at % indicating contribution of these mutations to the pathophysiology of the sars-cov- . the ongoing covid- global pandemic began from wuhan, china and has devastated millions of lives, economies and even nations as a whole. the first reported case was in created a scare as they had a relatively higher mortality rate. however, sars-cov- is by far the most contagious one [ ] [ ] [ ] [ ] . the higher incidence of viral infections would imply a faster evolution process for sars-cov- [ ] . this is so because more the virus replicates, higher are the chances of it accumulating mutations with the possibility of it leading to altered dynamics of its virulence, pathogenesis and interactions with host. the changes may not be necessarily favoring the virus; however, the unpredictability demands caution. the sars-cov- genome encodes for non-structural proteins in addition to the replicase polyprotein, the spike (s) glycoprotein, envelope (e), membrane (m), nucleocapsid (n) and other accessory proteins [ ] . the impact of mutations in all the regions of the genome needs to be assessed to understand viral evolution. with a definitive possibility of india becoming the most affected country by sars-cov- in near future and the demographic burden involved, its pertinent to be analyze the accumulating variations in the genome accounting for possible changes in protein and their potential to alter the virus in any manner. on th june, we retrieved fasta sequence congregations from india along their rational meta data from gisaid epicov server to construct the phylo-geo-network and analyze the haplogroups along with their geographical distribution across different states of india [ ] . herein we extend our study using the same congregation of sequences to analyze the nature and composition of the observed mutations and their impact on proteins of sars-cov- . gisaid epicov is an open access repository of genomic and epidemiologic information about novel corona viruses from across the world from wherein sequences were extracted and alignment performed as previously reported [ ] . briefly, fasta sequence congregations along their rational meta data from gisaid epicov (www.epicov.org) server. for mutational profile analysis with clinical correlation, we selected genomes of deceased patients from existing congregation. however, there were just two genomes for asymptomatic patients in the congregations. so, on . . , we downloaded fasta sequences with patient status from the same server and selected genomes from asymptomatic patients. as the data filter for genome extraction, we used hcov- as a virus name, human as a host, india as a location and complete sequence with high coverage. details of the asymptomatic samples are given in supplementary file . the sequences thus extracted were analyzed with nc_ . from wuhan, china as reference. mega(v. ) is a multithreaded tool for molecular and evolutionary analysis. multiple sequence alignment (msa) of the extracted sequences [ ] was initially visualized by this software then the variable sites are exported into spreadsheets with or without missing/ambiguous and gap sites along their respective positions [ ] . using this software, we estimate the mcl (maximum composite likelihood) nucleotide substitution pattern and tajima's neutrality test to understand transition transversion bias and nucleotide diversity [ , ] . piro iglsf is a matlab-based simulation software, we used this for the identify the location of mutated nucleotide position on specific gene [ ] . coronavirus typing tool of genome detective (v. . ) and covid- genome annotator of coronapp are webtools for analysis of protein and nucleotide mutation [ , ] . we used these tools for annotation, identification and classification of mutated protein followed by verification and validation of the positions with the mutated nucleotide sites by the output of mega. the nucleotide similarity percentage was validated by ncbi blast (blast.ncbi.nlm.nih.gov) to investigate the sequence diversity. sift, provean and ws-snps&go are the prediction tools which report positive or negative impact of variants on protein phenotype. the assessments are focused upon scores using several algorithms. it is expected that a sift score of < . is diseased ("affect protein function"), and that > . is neutral ("tolerated"). this is stated that a provean score of < − . is diseased ("deleterious"), and > − . is neutral. ws-snps&go 's phd-snp method is estimated to be > . mutation in the probability of disease, and < . is neutral [ ] [ ] [ ] . composition and distribution of variable sites (table ) and its negative value indicated the significance of these variable sites. however, excluding the gaps and ambiguous sequences reduced this percentage cover to . % encompassing variable sites which we have used for subsequent analyses reported in this study. this included parsimony informative (pi) sites and singleton sites (snp: single nucleotide polymorphism). the pi sites are those whose incidence was observed in multiple samples whereas singleton sites had a restricted single sample incidence. the distribution of these sites according to various substitutions, protein localizations and impact therein has been summarized in figure , supplementary file . as evident therein, c→t ( sites) forms the most prevalent mutation in both pi and singleton sites and g→t ( sites) comes a distant second. the common aspect of two most prevalent mutations is "t" being the substituted nucleotide. further, there were two multi-variable (mv) sites each in pi and singleton category wherein two separate mutations were observed at the same site in different samples. the details of observed mv sites have been summarized in table . the distribution of the variable sites across proteins of sars-cov- in a non-uniform manner is reflective of the differential contributions of proteins in evolution. as per our data, nsp had the maximum of variable sites followed by s protein ( sites), nsp b ( sites) and orf a ( sites) ( figure ; supplementary file ). these four proteins account for over half of the total variable sites of the genome and may be considered as drivers of genomic evolution for sars-cov- . the mutations of s protein have been the focus for multiple research groups owing to its plausible impact on viral entry to the host cell but the mutations elsewhere may be equally relevant as the viral genome is known to harbor only what's essential [ ] [ ] [ ] . we believe a holistic approach is required to understand the evolution as more often than not the selection advantage being offered by any mutation is a chance event and can be from any part of the genome. in terms of the impact of these variable sites on amino acid sequence of the viral proteins we classified them into four categories. first, the sites located in the extragenic region and hence no influence on the coding proteins. there were such variable sites localized to the utr regions ( in 'utr and in 'utr). secondly, snp-silent included those variable sites wherein the nucleotide change was leaving the amino acid sequenced unaltered. a total of such sites were distributed across the genome. thirdly, the variable sites which were leading to the introduction of a stop codon were referred to as snp-stop and there were such sites in our study. lastly, the variable sites which were affecting the protein sequence are referred as snp in the study and there were such sites (supplementary file ). the prevalence and distribution of these sites has been summarized in figure and results of the prediction of their impact on protein has been discussed later. in order to understand the underlying dynamics of substitutions, we performed the maximum composite likelihood estimate of nucleotide substitution as shown in table we thereon looked at these variations in combination with their prevalence across samples. the most prevalent nucleotide at the variable sites in reference sequence was c ( ) followed by g ( ) whereas t was by far the predominantly substituted nucleotide ( , %). also, the other three nucleotides had an almost equal representation in substitutions (a- , g- , t- ). this biased prevalence was not restricted to the alignment but was also getting translated to population incidence. there was a total of mutations with c as reference nucleotide and mutations with t as substituted nucleotide across studied genomes. the composition of variable sites, their substitutions and prevalence across samples has been summarized in figure and supplementary file . evidently, any particular mutation may be incident across multiple samples and a single sample can harbor multiple mutations. a cumulative number for the same has been referred to as "sum of mutation incidence" herein and thereafter in this study. we subsequently analyzed the patient's dataset with reference to age and gender for the incidence of mutations. however, since patients' data wasn't cumulatively available, the data for this aspect isn't exhaustive but representative for samples ( females and males). the patients whose genomes were used in the study and age was known were classified into seven categories from infancy to over years. the maximum number of patients for both males and females belonged to mature adulthood category of to years with and samples respectively (figure , supplementary file ) . this adheres to the fact that the older population is at a greater risk for infection owing to a possibly weaker immune system and other physiological conditions. the simple question of whether or not age and gender are associated with accumulation of genome variations has a not so simple answer. the overall average number of mutations per sample was . and the corresponding values for males and females separately was . and . respectively. thus, women were contributing more to the mutational accumulation as compared to males. the individual mutational load for different age groups in males and females has been represented in figure . evidently, women are contributing more to the mutational load except for three age groups; - years, - years and - years. the highest difference on the basis of gender is for - years ( . ) but since there was just one female sample in that age group, it can't be emphasized much in isolation but the overall pattern does seem relevant. this is more so because, in terms of incidence, males are almost . times of the females but in terms of variations, fewer females are contributing more to the mutational load. possibly, the virus is behaving differently depending on gender. the mutational distribution across different states of india was subsequently ascertained. generally speaking, more the virus replicates more should be the accumulated variations. the fact that the samples used in the study aren't uniformly distributed across states provides for an intriguing template for analysis. the number of samples and the mutations therein for different states has been summarized in figure however, we can surely say that some sequences are mutating more than the others but whether the geographical location is playing a role needs to be ascertained. a total of snps which were present which were altering the amino acid sequence. their details and positions have been summarized in table and supplementary file . we also ascertained the prevalence of these variants across samples. the most incident variant q h localized to orf a was present in samples followed by a d in nsp present in samples. amongst the silent snps, y y in m protein was present in samples followed by d d in s protein with incidences. the overall data for variants present in genomes or more has been summarized in figure a . conversely, we also assessed the accumulation of variations in a given genome as summarized in figure b . interestingly, one sample (genome id ) had highest incidence of mutations while samples harbored just a single mutation. there were samples with no mutations and with more than one mutation. to account for these, the sum of mutation incidence has been used in this study as explained above. the impact of mutations on proteins was predicted through three different tools; sift, provean, ws-snps&go; which classified the mutations as "neutral/tolerated" or "disease/affect protein function/deleterious". for the sake of simplicity, we have referred the results from all sites as neutral and disease. though the prediction outcomes of the three tools were not in sync for all sites but since the classification of outcomes were on similar lines, the results can be represented in a binary manner with four categories. first two categories represent wherein the three tools have the same prediction; either all predicting a site to be "neutral" or "disease". the other two categories represent deviation between prediction outcomes. they are "disease by one, neutral by two" and "disease by two and neutral by one". for comparison between variants, any mutation predicted as disease by two or three tools are considered as disease and mutations predicted as neutral by two or three tools are considered as neutral. the distribution of disease and neutral variants across the different genes of sars-cov- has been shown in table and supplementary file . these could be analyzed in three aspects. first, in terms of overall incidence. the maximum variants affecting protein sequence were present in nsp ( ) followed by spike (s) protein with variants. secondly, if we focus only on variants with predicted outcome as "disease" then nsp has a maximum of such variants followed by s protein and orf a with variants each. thirdly, we looked at those proteins which had more disease variants as compared to neutral. there were five such proteins namely: nsp , nsp , nsp , orf a, e, orf a. of these nsp had just two variants and both of them were predicted as disease by all three tools. others had differential bias towards disease variants. thus, we can say that though some regions of the genome have more variations but mostly neutral while others with fewer variations are more impactful in terms of their predicted impact due to more disease variants. conversely, mutations in some proteins can be relatively better tolerated by the viral genome. the overall protein prediction outcomes of the genomes have been summarized in figure . there were total of mutations ( %) and mutations ( %) which are predicted to be neutral and disease respectively by at least two tools. these predictions suggest that even though mutations are accumulating in sars-cov- , they are predominantly neutral. this is the possible reason that no major virulence or physiological deviations have been observed so far. in order to further assess impact of these variations we compared their prevalence across samples which were asymptomatic with those wherein the patient died. the idea was that if predictions are true, then asymptomatic samples should have more of neutral mutations whereas deceased ones should have more of disease mutations. the present congregation of samples in the study had just asymptomatic samples and deceased. thereon, we included mutations with those of deceased samples. their comparative data has been shown in table . the p value therein represents the probability that a given variant chosen at random to be neutral or disease. taking the threshold as common prediction by at least two tools the data gives interesting insights. as shown and previously mentioned, for the original congregation of samples, % mutations were neutral (p value . ) and % were disease the mutational accumulation in sars-cov- genomes is a multifactorial event with some areas of genome more prone to mutations, selective mutations being more prevalent, nonlinear assimilation of mutations across various states and differential correlation between mutational impact on proteins and physiological state. age and gender specific bias in incidence of mutations was observed. the asymptomatic samples had higher occurrence of neutral variants while deceased samples had relatively higher incidence of disease variants. a cross-linking of mutational dynamics and patient history will provide for better correlation and understanding of the variations in sars-cov- genomes. ; d= ) a t, t i, y c, e d, p s, g v, s i, p l, m i, h y, a t, s i, t k, t i, m i, v a, s g, t i, g s, t i, t a, t i, n t, k r, s f, k e, g e, p s, l f, a v, v i, t i, t i, l v, p l, h y, a v, s f, ; d= ) l f, n y, e d, a s, s f, g s, q r, t i, t i, e q, a s, t i, e d, t i, v i, q h, a s, t s, g v, t i, a s, i k, s p, d y, a v h q, p l t i, f c, l p, t k, v f, d y, c f g c a t a v s f i n c f a s, t i, h y, v l, g v, k r, k n, g v, q k orf a (n= ; d= ) v l, g v, s a, v f, t i l f, s f, s l, t i i t, l f, t i i t, l f, l f, q h severe acute respiratory syndrome isolation of a novel coronavirus from a man with pneumonia in saudi arabia a major outbreak of severe acute respiratory syndrome in hong kong a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster epitope-based chimeric peptide vaccine design against s, m and e proteins of sars-cov- etiologic agent of global pandemic covid- : an in silico approach identification of a novel coronavirus causing severe pneumonia in human: a descriptive study phylo-geo-network and haplogroup analysis of novel coronavirus (ncov- ) genomes from india molecular evolutionary genetics analysis across computing platforms statistical method for testing the neutral mutation hypothesis by dna polymorphism prospects for inferring very large phylogenies by using the neighbor-joining method microsatellite diversity, complexity, and host range of mycobacteriophage genomes of the siphoviridae family identification from high-throughput sequencing data coronapp : a web application to annotate and monitor sars-cov- mutations sift web server: predicting effects of amino acid substitutions on proteins predicting the functional effect of amino acid substitutions and indels : predicting stability changes upon mutation from the protein sequence or structure missense mutations in sars-cov genomes from indian patients geographic and genomic distribution of sars-cov- mutations mechanisms of viral mutation the authors thank the department of biological sciences, aliah university, kolkata, india for all the financial and infrastructural support provided. authors acknowledge all the authors associated with originating and submitting laboratories of the sequences from gisaid's epiflu™ (www.gisaid.org) database on which this research is based. the authors declare they have no competing interests. not applicable. all data pertaining to the study has been provided as supplementary material of the manuscript. rl: methodology, investigation, formal analysis and validation sa: conceptualization, supervision and writing. key: cord- -oxahu nz authors: lawes, roger a.; murphy, helen t.; grice, anthony c. title: comparing agglomerative clustering and three weed classification frameworks to assess the invasiveness of alien species across spatial scales date: - - journal: divers distrib doi: . /j. - . . .x sha: doc_id: cord_uid: oxahu nz to prioritize weed management at the catchment scale, information is required on the species present, their relatively frequency, abundance, and likely spread and impact. the objective of this study was to classify the invasiveness of alien species that have invaded the upper burdekin catchment in queensland, australia, at three spatial scales. a combination of three published weed classification frameworks and multivariate techniques were employed to classify species based on their frequency and cover at a range of spatial scales. we surveyed the upper burdekin catchment for alien species, and for each species determined the following distribution indices — site frequency, total cover, transect frequency per site frequency and quadrat frequency per site frequency, cover per quadrat when present, cover per transect when present, and cover per site when present. these indices capture the effect of species abundance and frequency between sites (site frequency and total cover), within sites (transect frequency per site and cover per transect when present), and within transects (quadrat frequency per site frequency and cover per site). they were used to classify the species into seven groups using a hierarchical cluster analysis. the relationship between the indices was explored to determine how effective the small scale, site‐specific indices were at predicting the broader, landscape‐scale patterns. strong correlations were observed between transect frequency per site and frequency (r ( ) = . ) and cover per transect when present and total cover (r ( ) = . ). this suggests that if a weed is abundant at the site level, it has the potential to occupy large areas of the catchment. the species groupings derived from the application of the three published weed classification frameworks were compared graphically to the groupings derived from the cluster analysis. one of the frameworks classified species into three groups. the other two frameworks classified species into four groups. there was a high degree of subjectivity in applying the frameworks to the survey data. some of the data were of no relevance to the classification frameworks and were therefore ignored. we suggest that the weed classification frameworks should be used in conjunction with existing multivariate techniques to ensure that classifications capture important natural variations in observed data that may reflect invasion processes. the combined use of the frameworks and multivariate techniques enabled us to aggregate species into categories appropriate for management. new species entering a landscape are subject to community-level processes and may or may not become invasive. this fate is governed by the characteristics of the species and habitat it has entered (williamson & fitter, ; richardson & py s ek, ) . the ecological definition of invasiveness has been discussed at length by williamson ( ) , lonsdale ( ) , davis & thompson ( ) , rejmánek et al . ( ) , and colautti & macisaac ( ) . despite these discussions, ecologists have struggled to classify successful invasions, partly because successful invasions are quite rare. according to the 'tens rules' (williamson & fitter, ) , a species has approximately a − probability of becoming a pest once introduced to a novel ecosystem. initially, ecologists focused on classifying species based on important plant characteristics (e.g. mack, ) culminating in models of plant invasiveness with a view to predicting which species are likely to become invasive (e.g. perrins et al ., ; rejmánek, ) . biosecurity agencies adopted key concepts from these studies to develop weed risk assessment protocols (pheloung et al ., ) , although recent reviews of these protocols found intentional human dispersal of propagules to be more important than plant species' ecological traits such as dispersal mechanism or seed size (caley & kuhnert, ) . richardson et al . ( ) , davis & thompson ( ) , and colautti & macisaac ( ) all developed frameworks to enable researchers to either describe a species as a 'type' of invader (davis & thompson, ; richardson et al ., ) or classify the invasion itself as a series of stages (colautti & macisaac, ) (table ). the ability to classify an invasion is necessary to prioritize weed management. the framework of richardson et al . ( ) comprised three components: introduction, naturalization, and invasion. introduction implies that the species has been transported by humans across major geographical boundaries. naturalization occurs when the species has overcome abiotic and biotic barriers to survival and can cope with environmental stochasticity. invasion occurs when the species produces reproductive offspring distant from the parent and culminates in new self-perpetuating populations. six key barriers that limit the spread of introduced plants were defined to enable researchers to classify species into one of the three categories (table ) . py s ek et al . ( ) recently discussed further details on the use and application of richardson et al . ( ) 's framework. davis & thompson ( ) proposed a scheme modeled after rabinowitz's ( ) classification of rarity, to distinguish 'invaders' as distinct from 'successional colonizers' and 'non-invasive colonizers' . the authors defined eight different types of 'colonizers' table data requirements to implement the weed classification framework devised by richardson et al. ( ) ; colautti & macisaac ( ) ; davis & thompson ( ) introduction barrier based on three distinct characteristics: dispersal distance (short/ long), uniqueness (novel/common to the region colonized), and impact on the new environment (small/great) ( table ). the interpretation of dispersal distance and the origin of the colonizer are scale dependent, while impact may be assessed from either a productivity or an ecosystem function. colautti & macisaac ( ) recently proposed a framework that attempted to eliminate the need for universal definitions of current terms by using operational terms with no a priori meaning (i.e. as one of five possible stages) in a process based model. the first stage ( ) begins when a propagule enters the habitat of interest. a species proceeds through stage , dispersal and survival in the habitat, and stage , survival and reproduction. if the weed satisfies these criteria, it may then be classified into separate stages that describe the frequency and abundance of a species in an environment. stage species are considered localized and numerically rare. stage a species are widespread but rare, stage b species are localized but dominant in those localities, and stage species are considered widespread and dominant (table ) . again, the environment can be any region of interest. the definitions of widespread and abundant are also open to interpretation. in essence, all three frameworks put forward a series of hypotheses that can be applied to data to classify the invasive characteristics of a species. the data requirements necessary to implement the frameworks vary, although each draws heavily on frequency and abundance information to classify the invasive characteristics of a species. the strengths and weaknesses of these frameworks were recently evaluated by murphy et al . ( ) who used existing data on invasive species in canada, and tested whether the frameworks were operational given a relatively rich species data set, and whether the different frameworks identified the same sets of species. these authors had difficulty defining the rate of spread, required by richardson et al . ( ) 's scheme, and impact as required by davis & thompson ( ) 's, and concluded that both schemes had limited potential in classifying regional alien floras. in this study, we surveyed the riparian zone of the burdekin river system for alien species. this river system, the third largest in australia, extends for over km and connects an array of pastoral properties. alien plant species such as cryptostegia grandiflora and ziziphus mauritiana are present in the catchment and affect the productive capacity of pastoral properties in this catchment (grice et al ., ) . however, the extent of their geographical distribution and possible impact along the riparian zone remains unknown. other alien grass species, such as andropogon gayanus , have dramatically altered ecosystem function in other australian rangeland systems and may be present in this catchment (rossiter et al ., ) . it was therefore necessary to document the weed status of this ecosystem to assist managers and state agencies in formulating a regional weed management strategy. the frequency and abundance of each species were recorded at course and fine scales. we apply a series of multivariate techniques and the frameworks devised by richardson et al . ( ) , davis & thompson ( ) , and colautti & macisaac ( ) to each species objectively, using the data derived from the survey. the frameworks present a series of hypotheses about invasion and apply these to data. we propose evaluating weed invasions as part of a larger, existing plant community where invasive species are classified based on their relative abundance, frequency, and cover at local and landscape scales. this approach differs from those proposed by the authors of previous frameworks as species are classified using all available data acquired from an extensive, multiscale landscape survey of species frequency and cover. the patterns derived from multivariate analyses may help formulate hypotheses about invasion processes in the landscape of interest and can be applied to any data set. once critical invasive species have been identified, management strategies can be formulated to target important species in the ecosystem. surveys were conducted on the upper burdekin catchment, situated in north queensland, australia (fig. ). the burdekin river and associated tributaries extend for km. eighty sites were surveyed at intervals of approximately km in the riparian zone of this catchment. at each site, a transect consisting of contiguous m × m quadrats was surveyed on the riverbank, on the mid-slope of the riparian zone and on the upper bank of the riparian zone. within each site, this design was replicated three times, with each replicate separated by m. in all, quadrats were surveyed throughout a site. at each quadrat, trained observers visually estimated and recorded the percentage canopy cover of each alien species. from these observations, the presence or absence of each species was determined at the quadrat, transect, and site levels. as part of the survey, canopy cover, c , in the i th site, j th transect, and k th quadrat was recorded as a percentage of the quadrat. these were transformed into a portion by dividing by to allow the portions in individual quadrats to be summed. seven indices were derived from the surveys to determine what portion of area the weed occupies, when present, at the quadrat level, transect level, and site level. if cover in the quadrat was greater than zero, then the presence, p , of a weed , in the i th site, j th transect, and k th quadrat was recorded as one, otherwise it was zero. for each species, four measures of cover were used: total cover, c ; cover per site, c s ; cover per transect, c t ; and cover per quadrat, c q (table ) . total cover was the cumulative sum of the proportion of cover measured in each quadrat. cover per quadrat, cover per transect, and cover per site provided a measure of cover occupied by the weed at each scale. as total cover was the cumulative sum of the portion of cover in all quadrats, it was necessary to scale lower level frequency measures to match this estimate of total cover. therefore, transect frequency was multiplied by , as there are quadrats total cover (portion) cumulative sum of portion of cover in each quadrat cover per quadrat, c q c q = total cover/frequency of quadrats where weed was present cover per transect, c t c t = total cover/frequency of transects where weed was present cover per site, c s c s = total cover/frequency of sites where weed was present in each transect. site frequency was multiplied by and as there were nine transects in each site and quadrats in each transect. these indices differ from estimates of mean cover because absences, at different scales, are deliberately ignored in order to give an indication of the proportion of land a weed occupies once it has invaded. three measures of frequency were also defined: site frequency, transect frequency per site frequency, and quadrat frequency per site frequency. these were simply derived from the three frequency indices presented in table . site frequency was the number of sites where the species was present. transect frequency per site frequency provides a measure of the mean number of transects occupied by the species when present at the site level. nine transects were surveyed at each site and this was the maximum possible value. quadrat frequency per site frequency provides an indication of the mean number of quadrats occupied by the weed when present. the maximum possible value was . the two relative frequencies indicate the extent to which a species occupies a given site. for each species, these indices collectively capture the variation in frequency and cover at three spatial scales. site frequency and total cover provide information on these effects at the landscape scale. transect frequency per site and cover per transect explain the effects of a species along a m transect within a site. quadrat frequency per site and quadrat cover per site capture details of species effects at a quadrat level within the site. prior to analysis, data were centred by subtracting the attribute mean, and standardized by dividing by the centred attribute's root mean square, enabling all attributes to be considered with equal weight. data were analyzed using non-metric multidimensional scaling to summarize the relationship between the seven indices for the species identified in the survey. the dissimilarities between species cover values were calculated using a euclidean distance measure. data were furthered summarized using an agglomerative hierarchical cluster analysis using the same distance matrix as the non-metric multidimensional scaling (gordon, ) . species were agglomerated using ward's minimum variance method that seeks to minimize the within group variation (see gordon, for discussion). small groups of species were agglomerated into larger groups using the same method and this iterative process continued until all species formed a single group. in this way, group membership was optimized at each stage of the agglomeration, and was essentially a local, or single stage optimization (gordon, ) . results of this process were displayed in a dendrogram, arbitrarily cut at a height of seven. it is realized that trade-offs exist between a parsimonious interpretation of data and the need to preserve and identify the inherent variation that exists between species groups. weed classification frameworks were applied to every species. the criteria used for each of the three frameworks are shown in table . davis & thompson ( ) required information on dispersal distance, uniqueness to the region, and impact on the new environment. dispersal distance (short or long) was derived from site frequency, not from an understanding of the species dispersal mechanism. plants with high site frequency (> sites out of surveyed, or %) were considered long-distance dispersers, whereas those occupying fewer sites were considered short-distance dispersers. uniqueness to the region, whether common or novel, was difficult to classify from the data alone. davis & thompson ( ) argue that species introduced into north america during colonization should now be considered common as they are unlikely to ever be eradicated. nearly all the species surveyed here satisfy that criterion, with most present in australia for over years. therefore, all cluster groups were considered common. impact, either little or great, was determined as cover per quadrat. species occupying more than % of a quadrat when present were considered to have a great impact, and those that occupied less than this were considered to have a small impact. impact was assessed at a very local scale. site and transect level impacts were excluded from the assessment for parsimony, although the three measures were strongly correlated (fig. b ). colautti & macisaac ( )'s classification system is largely devised around concepts of frequency and abundance and it was a relatively simple process to classify species based on the survey data, where cover provided a surrogate measure of abundance. species were all able to reproduce and survive, and by default, had reached stage . species with low cover per quadrat (< %) and very low site frequency (< sites) were considered stage species (localized and rare). species with high cover per quadrat (> %) and low to moderate site frequency (< sites) were considered stage b species (localized but dominant). species present at more than sites but with low quadrat cover were considered stage a species (widespread but rare). all other species, with moderate to high quadrat cover and a high site frequency were considered stage species (widespread and dominant). it was more difficult to apply the survey data to richardson et al . ( ) 's framework. all species had overcome major geographical barriers, environmental barriers at introduction, reproduction barriers, and local/regional dispersal barriers. therefore, all species were naturalized. we decided that a species should be present at more than sites before being considered invasive and those that occupied more than sites and had moderate or high quadrat cover were considered transformer species. data were not available to assess whether high levels of quadrat cover actually transformed the ecosystem, as suggested by richardson et al . ( ) . we adopted a threshold approach to determine when a species moved from one category to the next. the application of frequency and cover data to the frameworks was complicated by the subjectivity associated with determining when this transition should occur. this is an inherent weakness of all three classification systems. as such, many of the thresholds chosen were arbitrary. it was arguably easier to apply data to davis & thompson ( ) 's and colautti & macisaac ( ) 's frameworks. nevertheless, subjective elements of the classifications varied between the frameworks and this complicated our ability to arrive at seemingly analogous classifications across the three frameworks. this may be because none of the frameworks was designed to be used with frequency and cover data, such as those collected here. it was therefore difficult to implement the frameworks as their authors intended. processes such as impact and dispersal were not explicitly captured and therefore inferred from the available data. the ecological impact of invasive species in the burdekin catchment has not been quantified, further limiting our ability to implement the frameworks. groupings derived from cluster analysis and the three frameworks were evaluated graphically by superimposing all four groupings onto the ordination to assess the performance of each framework in reduced multivariate space. species were classified into three of the four possible categories using richardson et al . ( ) 's framework, where were 'naturalized' , were 'invasive' , and were 'transformer' species (table ) . species were classified into four of the six groups using davis & thompson ( ) 's framework (table ) . five were considered long-distance dispersal, common with great impact. twelve were evaluated as long-distance dispersal, common with small impact, and nine were short-distance dispersal, common with great impact (table ). the remaining species were short dispersers, common, with small impact. similarly, colautti & macisaac ( ) 's framework also classified the species into four groups. forty-five species were classified as stage , localized, but numerically rare, as stage a, widespread but numerically rare, four as stage b, localized and abundant. five species were classified as stage , widespread and abundant (table ) . analogous classifications were obtained for the most abundant and widespread species, bothriochloa pertusa, cenchrus ciliaris, c. grandiflora, parthenium hysterophorous , and urochloa mosambicensis . these were classified as transformer (richardson et al ., ) , long-distance/common/great (davis & thompson, (colautti & macisaac, ) . similarly, minor species were all classified as naturalized (richardson et al., ) , short-distance/common/great (davis & thompson, ) , or stage species (colautti & macisaac, (richardson et al., ) , a short-disperser, common with great impact (scg) (davis & thompson, ) and at stage b (colautti & macisaac, ) . in contrast, jatropha gossypifolia was classified as naturalized (richardson et al., ) , while classified as a short-disperser, common with great impact scg (davis & thompson, ) and at stage b (colautti & macisaac, ) (table ) . the first two dimensions accounted for . % of the variation with a stress of . . the first dimension primarily separated species on total cover, where the score was negatively correlated with total cover, cover per site, cover per transect and cover per quadrat (r = - . , − . , − . and − . , respectively). urochloa mosambicensis was the most abundant species. other species with negative scores for the first dimension included the woody species, ziziphus mauritiana, c. grandiflora and lantana camara. the grasses panicum maximum, c. ciliaris and b. pertusa all had negative scores for the first dimension, as did the forb parthenium hysterophorus. the small forbs, sida cordifolia, alternanthera bettzikiana, xanthium strumarium and a native perennial shrub carissa ovata were the only other species with negative scores for the first dimension. the second dimension separated species on site frequency. ziziphus mauritiana and panicum maximum had low scores for the second component, suggesting they occupied relatively few sites. conversely, s. cordifolia, c. grandiflora and c. ovata had high scores on the second dimension, corresponding to a high site frequency (fig. a,b) . the first and second dimensions captured the main effects present in the data. minor species had small positive scores on the first axis and small negative scores on the second. it is apparent from the correlations between the attributes and the axis scores that these species had low total cover and low site frequency. seven groups were identified by the agglomerative hierarchical cluster analysis (table ). the groups agglomerated species based on their total cover and site frequency and the groupings concur with the output derived from the non-metric multidimensional scaling (figs c & ) . detailed characteristics and species membership of each group are shown in table . the first three groups, with , , and species had high total cover and were present in , , and sites, respectively. all three groups were widely dispersed throughout the site. group species or u. mosambicensis, occupied quadrats per site when present, while groups and species occupied quadrats per site when present. groups and also occupied approximately % of the quadrat when present, while species in group occupied % of the quadrat. the remaining four groups, with , , , and species, respectively, occupied negligible proportions of the catchment, although group species were widespread and present in sites. all four groups were rare, even when present at the site, as species were found in few quadrats per site, and groups and species were generally found in just one of the nine transects surveyed per site. two species, u. mosambicensis and chloris babata, were outliers in this survey, because of their very high site and quadrat per site frequency (u. mosambicensis) and very high cover per quadrat but low site frequency (c. babata). as a result, they were excluded from all other groups. categorical data, such as life form, were ignored in this study and grouped species often had different ecological characteristics. for example, grasses, forbs, vines, and trees were all found in cluster groups , , , and (table ) . other information, such as time since introduction, was either unavailable or not applicable to this catchment as species were usually first introduced elsewhere in australia. we did not consider it necessary to further separate the few species present at a large number of sites and therefore discounted information on the spatial aggregation of individual species. the two site level measures of frequency, transect frequency per site and quadrat frequency per site, were moderately correlated with the landscape scale measure, site frequency (r = . and . , respectively). similarly, two of the three site-level measures of cover, cover per transect and cover per site, were highly correlated with total cover (r = . and . , respectively). cover per quadrat was only moderately correlated with total cover (r = . ). the relationship between attributes and the dimensions from the multivariate analysis are presented in fig. (b) . in general, there was an orthogonal relationship between site frequency and cover per quadrat, cover per transect, and cover per site. this suggests that the area occupied by a species, captured by the various measures of cover, was unrelated to the processes governing its dispersal at the catchment scale, as represented by site frequency. however, dispersal within site, measured by quadrat per site and transect per site, was correlated with cover per site and cover per transect. collectively, these relationships suggest a dichotomy between the processes involved in long-distance dispersal between sites and shorter dispersal distances for within site measures. weed classification schemes are designed to provide clarity and consistency to the field of invasion ecology through the use of broad, generally applicable terminology and concepts for species invasions. these frameworks put forward a series of hypotheses about the process of invasion and are intended to enable ecologists to qualitatively assess characteristics of invasive species and group them concordantly. the three frameworks require different criteria to perform the classifications. application of the frameworks is further complicated by the subjective interpretation of classification criteria and implementation of the classification systems. taxonomic classification systems, such as soil classifications (e.g. isbell, ) or land systems (e.g. tongway & hindley, ) , traditionally use a key that can be repeatedly applied to the soil or to a landscape. the classification systems employed in the present study have not been developed to the extent of soil or landscape classification systems, possibly because the data necessary to evaluate their performance was unavailable. their application is invariably subjective. for example, if we had considered species occupying more than sites to be 'widespread' (colautti & macisaac, ) and 'long-distance dispersers' (davis & thompson, ) , then the number of species categorized as stage widespread and abundant (colautti & macisaac, ) and longdistance, common with great impact, would have increased from five to eight species. every application of these frameworks is likely to be different and arguably site and scale specific, with different interpretations of what is frequent, abundant, local, common, high-impact, or long-distance dispersal. the frameworks were also unable to make use of additional available data, such as the cross-scale indices used in this study. given these limitations, it seems unlikely that widespread application of these frameworks, in their current form, can advance the field of invasion ecology. this may be because each framework has a discrete set of hypotheses about what constitutes a serious invader. the suitability and application of these hypotheses to a localized problem, with unique or inappropriate data, may not always be relevant for managerial purposes. there are elements of subjectivity in application of the clustering algorithms to the frequency and abundance data. this subjectivity exists at two levels. first, the decision to cut the dendrogram at a height of seven is arbitrary. cutting at a height of would have resulted in just three groups and yielded groupings synonymous with those derived from the classification frameworks. for example, groups and would have been amalgamated, even though group species were present, on average in more sites but occupied just % of the quadrat when present. in contrast, group species occupied % of the quadrat. we consider it productive to preserve this variation in the data since it potentially reflects key differences in modes of invasive spread that should be considered in management. thus, group species, while having the capability to become regionally common, probably through a greater capacity for long-distance spread, rarely become locally dominant. in contrast, group species appear to spread well locally and become dominant where they occur but at present have not dispersed as widely. in practical terms, management strategies could target group species because they have yet to disperse widely. they may also pose a greater threat to the ecosystem simply because they occupy a larger portion of the quadrat and transect when present. alternatively, group species have demonstrated proficiency for dispersal. if this trend continued, these species could threaten a much larger area, but with slightly lower levels of cover than group species. group species displayed a similar penchant for dispersal, but occupied very minor portions of the transect or quadrat. these are arguably the hardest species to classify with confidence. because they are successful dispersers, they have the potential to threaten the ecosystem, although at this point in time their effect is minor. our interpretations of the effect of species from group and group are also based on the frequency and abundance data at this point in time. their presence and overall effect in the catchment are minor. however, our analyses and interpretations are limited by the data available. additional information, such as time since introduction into the catchment may alter the assessment, but these data were unavailable for most species. some species may still be adapting, or may not have had the opportunity to establish a sufficiently large population to create a problem. individual species in groups , , and may indeed be sleeper weeds and become a greater problem than they are. this concept was recently reviewed by grice & ainsworth ( ) . in addition, the cluster analysis, detected two extreme outlier species, group , u. mosambicensis, and group , c. babata. urochloa mosambicensis was introduced as a pasture grass in the s and the observed frequency and abundance reflect its importance to the grazing industry as it would have been deliberately planted by graziers since its introduction. chloris babata was only found in three sites, but, on average occupied % of the quadrat and % of the transect, suggesting that this species could have a dramatic impact on the landscape. the other element of subjectivity lies with the algorithms used to assess the dissimilarity between species attributes and the algorithms used to combine species based on their dissimilarities (e.g. pielou, ) . in this study, euclidean metric scaling was applied to normalized ordinal data, ensuring that each attribute would contribute equally to the group structure. alternative distance measures would be required if data were categorical. clustering algorithms have been widely used and although elements of subjectivity remain, they can be applied to a weed data set and discriminate between species at some level. they use whatever data are available and help formulate hypotheses about the invasion in the landscape of interest. we used the classification frameworks in the first instance to provide baseline parameters for a mixture method of clustering using the software developed by fraley & raferty ( ) . data were not multivariate normal and the resulting classifications were influenced by the initial group structure and often identified local optima (fraley & raferty, ) . this approach may be adopted in the future if the methods are modified to cope with poisson distributions, such as those encountered here. the extensive survey data used to derive the landscape indices may be time consuming to collect in some landscapes. the similarities between the seven variables imply that meaningful classifications of invasive species can be achieved with a reduced sampling effort. the simple site level indices derived in this study also provide insights into how invasive a species may become at a larger scale. the particularly strong relationship between cover per transect and total cover in the landscape suggests that, if a species consistently occupies more than % of a transect in a specific habitat, then it has the potential to impact and invade that habitat. the link between these scales is worthy of further study to determine the relationship between local and regional impact by species at a certain stage in the invasion process. it may be argued that a species that occupied even a small portion of the landscape in one location has overcome many of the barriers to colonization identified in richardson et al., ' s framework and has the potential to invade similar habitats, providing a dispersal mechanism exists. both quantitative and qualitative forms of classification are developed to enable researchers and managers to classify an invasion and determine which species have dispersed widely and had the greatest impact on the ecosystem. both have limitations. the classification techniques provide a means for classifying an invasion based on a series of hypotheses that describe an invasion. in contrast, the multivariate techniques enabled us to partition species into meaningful groups, based on observed patterns within the data set. this data set contained information on species frequency and cover at a range of spatial scales. we suggest that the subjective, qualitative approaches developed by richardson et al., , davis & thompson, , and colautti & macisaac, be used in conjunction with multivariate techniques developed here to derive classifications that are data driven, but spatially and temporally specific. a classification scheme should ensure that species within a group have attributes that are more similar than those in an adjacent group. this concept may be extended further, where the species in the same group should be subjected to similar levels of management. the subjective use of thresholds that often separate species using a single attribute, arrived at non-sensical classifications when all the data are considered and viewed in multivariate space ( fig. c-f ). this may be because the thresholds chosen were inappropriate, or the frameworks lacked the capacity to make the best use of the data available. the frameworks had a tendency to under-classify the data. the other purpose of these analyses was to assess the characteristics of the data to determine which species have similar levels of frequency and abundance across a range of spatial scales. from this information, management strategies could be developed for as few or many groups as necessary. outliers may emerge in this analysis and warrant preferential treatment. our use of a simple cluster analysis to objectively group species with similar population-level distributional characteristics has several advantages over classification using any of the three individual frameworks. most importantly, it is repeatable, using data alone to drive the classification and assigning species to like groups independently of preconceived notions of their relative invasiveness. furthermore, it is regionally and temporally specific, lending itself well to a management application. a strong advantage also lies in the potential for integration of the outcomes of our analyses with current frameworks for assessing regional-scale dynamics of native species. application and evaluation of classification trees for screening unwanted plants a neutral terminology to define invasive species eight ways to be a colonizer, two ways to be an invader: a proposed nomenclature scheme for invasion ecology mclust: software for modelbased clustering, density estimation and discriminant analysis sleeper weeds -a useful concept regional and landscape-scale patterns of shrub invasion in tropical savannas the australian soil classification global patterns of plant invasions and the concept of invasibility predicting the identity and fate of plant invaders: emergent and emerging approaches. biological conservation invasiveness in exotic plants: immigration and naturalization in an ecological continuum do annual weeds have predictable characteristics? acta oecologia weed risk assessment. a weed risk assessment model for use as a biosecurity tool evaluating plant introductions the interpretation of ecological data alien plants in checklists and floras: towards better communication between taxonomists and ecologists the biology aspects of rare plant conservation what makes a species invasive? plant invasions -general aspects and special problems biological invasions: politics and the discontinuity of ecological terminology plant invasions: merging the concepts of species invasiveness and community invasibility naturalization and invasion of alien plants: concepts and definitions testing the grass-fire cycle: alien grass invasion in the tropical savannas of northern australia manual for assessment of soil condition of tropical grasslands. csiro division of wildlife and ecology the characters of successful invaders we thank lindsay whiteman, mike nicholas and brett abbott for assisting with fieldwork and professor iain gordon for constructive comments on earlier versions of the manuscript. we also thank the crc for australian weed management and crc for tropical savannas for funding various components of the study. key: cord- - g tb authors: rhodin, michael h. j.; dinman, jonathan d. title: a flexible loop in yeast ribosomal protein l coordinates p-site trna binding date: - - journal: nucleic acids res doi: . /nar/gkq sha: doc_id: cord_uid: g tb high-resolution structures reveal that yeast ribosomal protein l and its bacterial/archael homologs called l contain a highly conserved, basically charged internal loop that interacts with the peptidyl-transfer rna (trna) t-loop. we call this the l ‘p-site loop’. chemical protection of wild-type ribosome shows that that the p-site loop is inherently flexible, i.e. it is extended into the ribosomal p-site when this is unoccupied by trna, while it is retracted into the terminal loop of s rrna helix when the p-site is occupied. to further analyze the function of this structure, a series of mutants within the p-site loop were created and analyzed. a mutant that favors interaction of the p-site loop with the terminal loop of helix promoted increased affinity for peptidyl-trna, while another that favors its extension into the ribosomal p-site had the opposite effect. the two mutants also had opposing effects on binding of aa-trna to the ribosomal a-site, and downstream functional effects were observed on translational fidelity, drug resistance/hypersensitivity, virus maintenance and overall cell growth. these analyses suggest that the l p-site loop normally helps to optimize ribosome function by monitoring the occupancy status of the ribosomal p-site. over the past decade, atomic resolution ribosome structures have revealed the locations of critical elements. however, these static images do not reveal the dynamic movements within this complex macromolecule. the ribosome must coordinate multiple activities between spatially and functionally different sites in two subunits. these include three transfer rna (trna)-binding sites, the peptidyltransferase and decoding centers and the elongation factor interacting regions. events occurring in these regions must be carefully coordinated to assure rapid and accurate decoding of messenger rnas (mrnas). current efforts in the field are focusing on determining the mechanisms by which these functional centers synchronize their actions and communicate with each other. the eukaryotic ribosome contains nearly intrinsic proteins. the high degree of similarity across species, from the primary amino acid sequences to their tertiary structures, suggests conserved functional roles beyond serving as mere scaffolding for the rrnas. ribosomal protein l of saccharomyces cerevisiae is an essential, highly conserved component of the s subunit (in bacteria and archaea, the homologous protein is named l ; the yeast nomenclature is used throughout this text to minimize confusion). at the primary amino acid sequence level, l is well conserved among eukaryotes ($ - % identity), while bacterial and archaeal l proteins are less well conserved ( - % identical) (supplementary figure s a ). l is uniquely positioned at the interface between the large subunit central protuberance ( figure a and b) and the head of the small subunit ( figure a ) ( ) ( ) ( ) ( ) ( ) . in the small subunit, the head region undergoes significant rotational movement relative to the central protuberance between the pre-and post-translocational states ( ) , and the protein-protein interactions between l and s (s in bacteria and archaea) on the small subunit (the b b and b c intersubunit bridges) undergo the largest intersubunit structural rearrangements between these two states ( ) ( ) ( ) . these observations suggest that l may play a central role as an informational conduit between the two subunits. detailed analysis of x-ray crystallographic and cryo-em structures ( figure b ) reveals that the concave surface of the b-sheet portion of l interacts with specific nucleotides in the minor groove of s rrna helix ( ) . l also makes contacts with the helix iii and loop c regions of s rrna; these connections have been hypothesized to help stabilize s rrna interactions and may participate in an information signal transmission network linking functional centers within the ribosome ( , ) . importantly, the b b and b c intersubunit bridges with s are the only protein-protein interactions between the two subunits ( , , , ) . analyses of these structures indicate that contacts involving l and s through the b b and b c bridges break and rearrange after eef- binding and ribosome ratcheting, controlled in part by differentially charged amino acid side chains between the two proteins ( , , , , , ) . an internal loop of l that we denote the 'l p-site loop', which is roughly formed by amino acid residues - , also directly contacts the t-loop of the peptidyl-trna in the p-site through trna nucleotide ( ) ( ) ( ) . at the level of primary amino acid sequence, the p-site loop is highly conserved among eukaryotes ( - % identity), while it is less well conserved among bacteria and archaea ( - % identity) (supplementary figure s b) . at the biochemical level, however, the p-site loop is significantly more homogeneous, containing a large number of well-aligned charged and aromatic amino acids. in particular, a , f , r and i (yeast numbering) are universally conserved. an alignment of the p-site loop structures from yeast, haloarcula marismortui, thermus thermophilus and escherichia coli reveals that the p-site loop is extremely well conserved at the structural level (supplementary figure s c) . in yeast, l is encoded by the paralogous genes rpl a and rpl b located on chromosomes and , respectively ( ) . the -kda proteins are amino acids long and are identical except for an alanine (l a) to threonine (l b) difference at the third amino acid position. analysis of l in the late s (a.k.a. l ) showed that expression of either isoform was sufficient for cell viability ( ) . however, when expressed as the sole form of l , rpl a mrna transcripts accumulated to only - % of wild-type levels as compared to cells expressing both isogenes, while rpl b mrnas accumulated to - %. expression of either isogene alone also affected s subunit assembly: a strain expressing only l b grew at wild-type rates but synthesized fewer s subunits than wild-type cells (although apparently not below a threshold necessary for wild-type growth rates), while strains expressing only l a grew more slowly than wild-type, and synthesized only - % of wild-type levels of total l and s subunits ( ) . a random mutagenesis screen of rpl b for cold-sensitive mutants identified alleles that promoted s pre-rrna processing and initiation defects ( ) . the specific mutants identified in that study were s p, s p, s f, a v, s c and g d. in arabidopsis, divergent untranslated regions (utrs) between the two isogenes were found to result in differential expression among plant tissues ( ) . in addition to its function as a ribosomal protein, l has been implicated in p activation through its interactions with hdm in the nucleus of human fibroblast cells ( ) , and mutant forms of l have been linked to daimond-blackfan anemia in humans ( ) . although the structural information suggests that l should play a significant role in translation, functional analyses of the protein in this role have not been performed. in this report, a series of mutants were generated using a reverse genetics approach to parse the role of the l p-site loop. detailed biochemical and structural analyses focused on two multi-amino acid mutants with opposing effects on rrna structure and trna binding. we propose that prior to peptidyltransfer, the presence of peptidyl-trna in the large subunit p-site positions the l p-site loop to interact with the helix of the large subunit rrna. after peptidyltransfer, spontaneous translocation of the deacylated trna to the large subunit e-site allows the l p-site loop to extend into the p-site, breaking contact with helix . by this model, we hypothesize that the l p-site loop functions locally as a sensor of the occupancy status of the ribosomal p-site. restriction enzymes were obtained from promega (madison, wi, usa), mbi fermentas (vilnius, lithuania) and roche applied science (indianapolis, in, usa). the quikchange xl ii site-directed specific mutagenesis kit was purchased from stratagene (la jolla, ca, usa). dna sequencing was performed by genewiz (germantown, md, usa). escherichia coli dh a was used to amplify plasmid dna. transformation of yeast and e. coli and were performed as previously described ( ) . ypad, sd and . mb plates for testing the killer phenotype were as previously reported ( ) . plasmids for expression of dual luciferase reporters were described previously ( ) . saccharomyces cerevisiae strain psy (mat rpl a::his rpl b::his ura - leu d trp d his d + ycpl b ura ), an rpl a/rpl b gene deletion strain in which l is supplied by a ura -cen based rpl b clone, was a generous gift from dr. pamela silver ( ) . the l-a and m viruses were introduced into psy by cytoplasmic mixing (cytoduction) through nonproductive mating with jd [mata kar - arg (l-ahn m )] to produce the killer + strain jd as previously described ( ) . wild-type rpl b was isolated from yeast strain psy plasmid (pycp l b ura ). using flanking bamhi restriction sites, a . -kb fragment of dna containing both the -bp wild-type rpll b orf plus the native and utr regions ( bp and bp, respectively) was purified by agarose gel electrophoresis. this -bp fragment was ligated into bamhi digested prs , a low copy trp -selectable plasmid (purchased from atcc, manassas, va, usa) ( ) to create prs l b-trp . this plasmid served as the template for generation of rpl b mutants by site directed mutagenesis using the primers listed in supplementary table s . wild-type and mutant prs l b-trp clones were transformed into jd , selected for growth on -trp medium, and cells having lost the ura -based plasmid were identified by their ability to grow in the presence of -fluoroorotic acid ( -foa) ( ) . the effects of temperature and translational inhibitors were assessed by standard -fold dilution spot assays. yeast were grown in h-tryptophan synthetic deletion (sd) media (-trp) to mid log phase. od values were obtained, and cells were serially diluted -fold from to cfu per . ml and spotted on -trp plates. growth was monitored at c, c and c, and pharmacogenetic assays utilized mg/ml paromomycin, mg/ml anisomycin or mg/ml sparsomycin incubated at c for - days. killer virus assays were performed as previously described ( ) . the dual luciferase reporter plasmids pydl-control, pydl-la, pydl-ty , pydl-uaa ( ) and pydl-agc ( ) were employed to quantitatively monitor programmed À ribosomal frameshifting, programmed + ribosomal frameshifting, suppression of a uaa codon and suppression of an agc serine codon in place of an aga argine codon in the firefly luciferase catalytic site respectively. in this study, the reporters were housed in leu -based reporters: the frame dual luciferase reporter was pjd , the l-a dsrna virus À prf containing reporter was pjd and the ty containing + prf reporter was pjd . cells were grown overnight in -ml volumes of -leu synthetic depletion media to mid log phase (a = . - . ). cells were washed, resuspended in lysis buffer ( x pbs ph . , mm pmsf) and lysed using . -mm glass beads with a vortex mixer for - min at c. lysates were clarified by centrifugation for min. at r.p.m. at c. samples were maintained on ice, and ml of clarified lysate was added to ml of pre-aliquoted promega larii reagent, mixed by pipetting, and read in a td / luminometer. immediately upon completion of this read, ml of promega stop and glo buffer was added to the tube, pipetted to mix and read again. this was repeated - times per strain per reporter depending on the consistency of the data. frameshifting rates were determined by taking the ratio of firefly to renilla luciferases for each sample, and then taking the ratio of the average ratios of the frame samples to that of test reporter ratios to obtain the rates for both À and+ prf. these results were then analyzed by t-test to determine statistical significance compared to wild-type levels as previously described ( ) . prior to determining rates of uaa readthrough (nonsense suppression), strains were cured of the endogenous yeast prion [psi + ] by daily serial passage of cells in -trp liquid media containing mm guanidine hydrochloride for days. rates of nonsense suppression were determined as previously described ( ) using the leu selectable -frame control pjd and in-frame uaa containing reporter pjd . missense reporters were based on ura plasmids previously described for the sense reporter ( ) and for the firefly luciferase arginine codon (aga) to serine (agc) missense reporter plasmid pydl-agc ( ) . methodologies were the same as those for other dual luciferase assays described above. cells were grown overnight in a c shaker in ml of ypad media to mid-log phase (od . - . ), cooled to c for h to allow ribosomes to run off of transcripts while remaining tightly coupled. cells were harvested by centrifugation and washed three times with ml . % kcl solution. cell pellets were stored at À c until needed, at which time they were thawed and resuspended in ml binding buffer ( mm tris-hcl ph . , mm mgcl , mm nh cl, mm dtt, mm pmsf) per gram of cells. cells were lysed with a : vol of zirconian beads (biospec, bartlesville, ok, usa) and disrupted using two -min pulses of a minibead beater. lysates were clarified by centrifugation at r.p.m. ( g) using an msl- rotor at c for min. ribosomes were chromatographically purified using sulfolink beads (pierce, rockford, il, usa) as previously described ( ) , and eluted from the resin in ml of elution buffer ( mm tris-hcl ph . , mm mgcl , mm kcl, mm dtt, . mg/ml heparin). eluted ribosomes were treated with mm puromycin and mm gtp for min at c and were layered on top of a -ml glycerol cushion [ mm hepes-koh ph . , mm mg(ch coo) , mm nh cl, mm dtt, % glycerol] and pelleted by centrifugation at r.p.m. at c for - h. pellets were washed with ml of storage buffer [ mm hepes-koh ph . , mm mg(ch coo) , mm nh cl, mm dtt, % glycerol], and resuspended in - ml of storage buffer. concentrations were determined spectrophotometrically ( od = pmol ribosomes). the salt-washed ribosomes were aliquoted and stored at À c for up to months. ribosomal rrna quality was checked on . % agarose gels and rrna to protein ratios were monitored by determining od to od ratios. polysome profiles were obtained by sucrose density gradient centrifugation as previously described ( ) . samples were split, and ml of dimethyl sulfoxide (dmso) was added to half of the samples, while ml of mm m was added to the other half. samples were incubated at c for min. ribosomes were precipitated by the addition of ml of ice-cold % ethanol and stored at À c for - h. ribosomes were pelleted by centrifugation at r.p.m. for min and resuspended in lysis buffer and rrnas were isolated using an ambion (austin, tx, usa) rnaqueous Õ -micro rna isolation kit. optical densities were taken at nm and nm to monitor the quantity and quality of rna, and samples were resuspended at a concentration of mg rrna/ ml in pure water. hplc purified oligonucleotide primers purchased from idt (coralville, ia, usa) are listed in supplementary table . oligonucleotides were resuspended to pmol/ml, end labeled with g[ p]atp with t polynucleotide kinase (roche, indianapolis, in, usa), and purified from free radiolabeled nucleotide by passage through a microspin g- column (ge healthcare, piscataway, nj, usa). annealing reactions utilized mg of modified rrnas and ml of labeled oligonucleotide heated at c for min, followed by a - -min incubation at - c below the t m of each oligonucleotide. annealed rrna/ primers ( ml each) were added to ml of cold enzyme mix [ . ml mm dntp, . ml mm dtt, ml x superscript iii buffer, . ml superscript iii (invitrogen life technologies, carlsbad, ca, usa), . ml h o]. for sequencing samples, an additional ml of each ddntp was added to each c, t, a, g, sample, respectively. primer extension reactions were performed at c for min, with potential -min-long extensions preceding the c at lower temperatures depending on the individual t m values of the primers. denaturing rna loading dye ( ml) was added to each sample, heated to c for . min, and samples were resolved through % urea-acrylamide denaturing gels. gels were dried and radiolabeled samples were visualized by phosphorimagery. the published structures for the s ribosome from e. coli [pdb accession numbers: avy, aw ; ( )], as well as yeast s structures from yeast ( s i, s h, jyv, jyw, jyx; ( , ) ] were used in the analysis of this work and the generation of figures. published t. thermophilus s subunits containing a-site, p-site and e-site phe-trna were also employed ( g x, ( ) . all structures were visualized and manipulated using macpymol software ( ) . the visualization of a single salient loop of l interacting with peptidyl-trna indicated that it might play a vital role in sensing peptidyl-trna occupancy status and transmitting this information to other functional centers of the ribosome. as cells expressing rpl b alone were healthier than those solely expressing rpl a, genetic manipulations began with the yeast rpl ad rpl bd double knockout strain jd expressing wild-type (wt) rpl b from a low-copy, ura -selectable episomal plasmid (prpl b-ura ). oligonucleotide site-directed mutagenesis was used to construct a series of mutants, each containing changes of , or sequential amino acids ( figure c ). stretches of amino acids from arginine to arginine in the l p-site loop were targeted for site-directed mutagenesis expressed from a low-copy, trp -selectable episomal plasmid under control of the endogenous rpl b promoter (prpl b-trp ). after transformation and selection on sd medium lacking tryptophan (-trp), cells expressing only mutant rpl b alleles were identified by their ability to grow on sd-trp medium containing -flouroorotic acid ( -foa). three of the multiple substitution mutants were inviable as the sole forms of l b. these were r ytvrtfgir !alanine (i.e. - a); deletion of residues - ( - Á); and f gir !alanine ( - a). viable mutants, r ytv !alanine ( - a), v rtf !alanine ( - a), r a, y * (mutations including Á, a, r, e, s, i, q, n, h and f), f a and r a were rescued from yeast into e. coli, and the mutations were confirmed by dna sequencing. the l p-site loop mutants confer temperature-and drug-specific growth phenotypes displayed roughly wild-type growth. cold sensitivity was assessed at c and both mutants grew at wild-type rates. - a showed enhanced growth at c relative to itself at c, while mutant - a was similar to wild type. r a grew at wild-type rates at c and c but showed enhanced growth at c. the y * mutants displayed mutant-specific effects on growth rates at c, but did not confer significant phenotypes at either c or c. f a had wild-type growth rates at all temperatures, while r a showed depressed growth at c, which was rescued at c. small molecule inhibitors of protein translation are useful probes for identifying changes in ribosome function. this study utilized three such molecules: paromomycin, anisomycin and sparsomycin. the effects of all three drugs were monitored using dilution spot assays at c on sd-trp media containing various drug concentrations. paromomycin is an aminoglycoside antibiotic that increases translational error rates by artificially stabilizing codon:anticodon interactions at the decoding center in the small ribosomal subunit ( ) . as compared to their intrinsic growth in the absence of drug, both the - a and - a mutants were slightly hypersensitive to mg/ml paromomycin, as were y Á, y n and y h. in contrast, r a, f a and r a were all paromomycin resistant ( figure b ). anisomycin competes with the end of the aa-trna for binding to the a-site pocket of the ribosome ( , ) . both - a and - a showed anisomycin resistance at mg/ml, as did several y * mutants, and r a ( figure b ). sparsomycin binds to the p-site and interferes with peptidyl-trna binding and peptidyl transfer ( , ) . - a and - a mutants were hypersensitive to mg/ml sparsomycin, as were most of the y * mutants, with the exception of y f, which conferred slight resistance to this drug ( figure b ). the yeast 'killer' system is composed of the l-a helper and m satellite dsrna viruses ( ) . the l-a dsrna viral genome encodes a capsid protein (gag), and an rna-dependent rna polymerase (pol) that is synthesized as a gag-pol fusion protein consequent to a À programmed ribosomal frameshifting (prf) event ( ) . the m satellite dsrna is encapsidated and replicated in l-a encoded viral particles, and the m (+) strand encodes a secreted toxin that kills uninfected yeast through its interactions with the gpi-anchored kre p cell wall assembly protein ( ) . changes in À prf efficiency alter the ratio of gag to gag-pol, and inhibit the ability of cells to maintain m ( ) . to monitor the effects of the mutants on killer virus maintenance, colonies of jd cells expressing either wild-type or mutant rpl b alleles were spotted onto a lawn of diploid, killer À indicator cells. cells expressing wild-type rpl b were killer + as demonstrated by their ability to inhibit growth of the indicator cells ( figure c ). in contrast, isogenic cells expressing the - a, - a and f a mutants were killer À . a weak killer phenotype, defined by decreased zones of growth inhibition, was observed in mutants y e, y n, y h and f a. the rpl b mutants affect translational fidelity 'translational fidelity' is generically used to describe the accuracy of protein synthesis. a series of bicistronic reporter plasmids were used to quantitatively monitor the effects of the l b mutants on four aspects of translational fidelity: À prf, + prf, suppression of a uaa nonsense codon and incorporation of a missense near-cognate amino acid. in jd cells expressing wild-type rpl b, À prf directed by the l-a dsrna viral signal was . % ± . %. this compares favorably with other 'wild-type' strains in our laboratory (normal range from % to % ( , ) . the - a mutant promoted increased À prf ( . ± . -fold relative to wild type), while - a trended in the opposite direction ( . ± . -fold relative to wild type) ( figure , and table ). both these values were statistically significant and correlate well with the killer À phenotypes. y Á, y n, y e and y h mutants also showed increased rates of - prf, with statistically significant rates ranging from y h at . -fold wild type to y e at . -fold wild-type. y a, y s, y q and y f all had wild-type rates of À prf. while both À and + prf are kinetically driven events, the substrates for the slippage are distinct: À prf requires that both the ribosomal a-and p-sites are occupied by trnas, while+ prf occurs while the a-site is empty ( ) . rates of + prf were monitored using a cis-acting signal derived from the ty retrotransposable element using pydl-ty . baseline + prf efficiencies in cells expressing wild-type rpl b were . % ± . %. - a had no effects on + prf, while - a promoted a small but statistically significant increase ( . ± . -fold of wild type; figure ). significant changes in + prf were also observed in the y a, y s, y n, y e, y h and y f mutants. mrna decoding occurs in the small subunit decoding center, and changes in termination codon recognition (nonsense suppression) is another indicator of altered translational fidelity. pydl-uaa ( ) , which contains an in-frame termination codon immediately of the firefly luciferase gene, was used to monitor this parameter. the baseline rate of nonsense suppression in cells expressing rpl b was . % ± . %. the - a mutant slightly improved this aspect of translational fidelity, with nonsense suppression levels decreasing to . ± . -fold of wild-type levels. - a did not affect uaa recognition (figure ) . y Á, y a, y s, y n, y e and y h all promoted increased rates of nonsense suppression ranging from . -to . -fold wild type. pydl-agc tests missense suppression levels by monitoring rates of incorporation of an arginine (aga) near-cognate amino acid instead of a cognate serine (agc) at the catalytic codon within the firefly luciferase gene as previously described ( ) . thus, in this assay, mis-utilization of near-cognate trna arg at the ser agc codon restores firefly luciferase activity. wild-type missense levels were measured at . % ± . , comparable to previous studies ( ) . mutant - a had significantly higher levels of missense suppression (measured at . ± . -fold wild-type), while - a did not significantly affect this phenomenon ( . ± . fold wild type) ( figure ). missense suppression was not assayed for the single amino acid mutants. the mutant rpl b alleles promote opposing effects on trna binding to the ribosomal a-and p-sites sucrose gradient analyses were employed to fractionate cycloheximide arrested elongating ribosomes on mrnas in lysates generated from jd cells expressing wild-type l b, - a, and - a. in all strains the s peak was smaller than that of the s fraction which can be attributed to the presence of only a single copy of rpl b, which has previously been shown to effectively reduce the number of s subunits produced by the cell to - % of true wild-type levels while having no visible phenotypic effect on growth ( ) . no significant differences were observed among the samples (data not shown). phenotypic variation in prf and in the presence of anisomycin and sparsomycin are indicative of altered interactions between the ribosome and trnas. p-site trna k d values were determined in vitro by binding -fold serial dilutions of n-acetylated-[ c]phe-trna to ribosomes until saturation was achieved ( figure a ), and the resulting data were used to determine steady-state single site binding k d values ( figure b ). wild-type ribosomes bound this p-site substrate with a k d of . ± . nm. the - a mutants promoted a slight increase in affinity for p-site substrate (k d = . ± . nm), while - a had the opposite effect (k d = . ± . nm). given the physical interaction between the l p-site loop and peptidyl-trna, it was imperative to determine whether the observed small changes in p-site affinities promoted by the mutants were biochemically significant. to this end, multiple turnover puromycin reactions were performed. in these experiments, puromycin was added to ribosomes pre-incubated with excess p-site substrate, i.e. ac-[ c]phe-trna phe , and accumulation of the peptidylpuromycin product was monitored over time. in these reactions, the first round of peptidylpuromycin synthesis is very rapid. next, in a slow step, the ribosome intrinsically translocates the deacylated trna phe into the e-site ( ), followed by the slow diffusion of ac-[ c]phe-trna phe into the p-site where it can react with puromycin. repetition of this cycle results in slow multiple rounds of product synthesis ( figure c ). assuming that the l mutants do not affect either rates of intrinsic translocation or of ac-[ c]phe-trna phe diffusion into the p-site, changes in product accumulation, i.e. k obs , should be due to differences in binding affinities for the p-site substrate. consistent with this model - a promoted . ± . -fold increased k obs relative to wild-type ribosomes, while - a decreased k obs to figure . the l b mutants promote defects in translational fidelity. isogenic yeast cells expressing either wild-type or mutant forms of l b were transformed with dual luciferase reporters and control plasmids and rates of translational recoding were determined. all results are graphed as fold wild type. À prf was measured using the yeast l-a virus frameshift signal. + prf was directed by the frameshift signal derived from the ty retrotransposable element. nonsense suppression denotes the percentage of ribosomes able to suppress an in-frame uaa termination codon positioned between the renilla and firefly luciferase reporter genes. missense suppression rates were evaluated by incorporation of an arginine (aga) near-cognate amino acid instead of a cognate serine (agc) at the catalytic codon within the firefly luciferase gene. error bars denote standard error. p-values are indicated above samples showing statistically significant changes. (figure e and f). the p-site loop is flexible depending on the occupancy status of the p-site the highly basic nature of the p-site loop, its interaction with peptidyl-trna, and its proximity to s rrna helix (h ) suggested that it might interact with either of these two rna components depending on the occupancy status of the p-site. changes in interactions between the p-site loop and local rrna structures may in turn propagate outward to more distant regions of the ribosome. to test this, shape ( ) ( ) ( ) was employed to probe for structural alterations in selected regions of the s, s and s rrnas due to either the l b mutants or in wild-type ribosomes with occupied or unoccupied p-sites. due to the large size and complex three-dimensional structure of the ribosome, the entire rrna content was not examined. rather, approximately one-third of the rrna bases were interrogated, focusing on those bases closest to l , the a-and p-sites, and the decoding center. in the first series of experiments, salt-washed wild-type and - a, - a, y q and y f mutant ribosomes (chosen for structural analyses because they had the most pronounced genetic phenotypes) were treated with m , an electrophile that adds an adduct onto the oh groups of solvent exposed base sugars. modifications were performed on salt-washed ribosomes because they represent the thermodynamic 'ground state' of the ribosome. thus, the structural changes observed are indicative of changes in the full 'dynamic potential' of the ribosome as opposed to conformations locked in by e.g. occupation of binding sites by trnas or ribosome-associated factors. rrnas were extracted, hybridized with [ p]-labeled oligonucleotide primers and reverse transcriptase primer extension reactions were performed. the products were separated through urea-acylamide denaturing gels, and visualized using a phosphorimager. -oh ribose modification results in a strong stop -nt of modified bases, and the intensity of the stops are proportional to the solvent accessibility and flexibility of riboses. comparison of the protection patterns between wild-type and mutant ribosomes enables identification of specific bases which became protected or deprotected relative to wt. in all areas examined, rpl b ribosomes y q and y f matched the wild-type rrna base modification profile (data not shown), while - a and - a ribosomes revealed consistently reproducible differences. the most significant changes in rrna structure were observed in bases c -a (e. coli numbering: c - ) located in the terminal loop of s rrna h ( figure a and e). the two mutants promoted opposing patterns of base protection/deprotection in this structure. specifically, as compared to wild-type ribosomes, - a promoted enhanced protection of this loop, while the loop was deprotected in the - a mutants. analysis of the recent cryo-em yeast ribosome structure ( ) revealed that these h loop bases are located within Å of the stretches of amino acids changed to alanines in both the - a and - a mutants ( figure b ). these findings suggested that the two mutants had the effects of displacing the p-site loop into two opposing conformational states: extended toward the p-site ( - a), or retracted into h ( - a). to test whether these two states are naturally dependent on p-site occupancy, the experiments were repeated with wild-type and mutant ribosomes with or without trna phe in their p-sites. consistent with this model, addition of trna to the p-site of wild-type ribosomes resulted in slightly enhanced protection of the h terminal loop bases closest to the p-site loop (a -a ). interestingly, c showed significant deprotection when the p-site was occupied by trna. this base is on the far side of the terminal end of h from the p-site loop, suggesting that h itself alters its conformation upon trna occupancy of the p-site ( figure c ). - a's h bases were unchanged between p-site bound and unbound ribosomes, consistent with the p-site loop positioned in the 'retracted' state in this mutant, although small differences in the protection patterns suggest that the p-site loop is in a slightly different orientation in this mutant. in contrast, while - a ribosomes, i.e. the p-site loop 'extended' state, showed deprotection at all bases (c -a ) for both p-site bound and salt washed ribosomes, bases a -a were less deprotected when trna was in the p-site and c was even more reactive, consistent with the notion that the p-site loop interacts with h when peptidyl-trna is in the p-site. although no other shape-specific changes were observed, several other phosphodiester bonds of specific s rrna bases were reproducibly more, or less, intrinsically labile as compared to wild type ( figure d ). in both mutants, g and g located in expansion segment (es ) were more stable than in wild-type ribosomes as evidenced by reduced intensity of strong reverse transcriptase stops -nt of these bases. additionally, bases a -a (e. coli a -u ) located in the terminal loop of helix were hyper-labile in - a mutant ribosomes as compared to wt, as shown by the presence of strong stops with increased intensity -nt of these bases. these are mapped onto the two-dimensional structure of yeast s rrna ( figure e ). the l p-site loop is largely comprised of polar amino acids and carries a net positive charge, making it ideal for interactions with the phosphate backbones of nucleic acids, e.g. rrna and trna. positioned between h and the peptidyl-trna t-loop, several of its amino acids are within h-bonding distance of h ($ . Å ), while c of the peptidyl trna t-loop comes within . Å of g in the l p-site loop ( , ) , suggesting that the l p-site loop can directly interact with both of the rna-based structures. while currently available x-ray crystal structures are unavailable for ratchet-state ribosomes, a recently published examination of trna movement through the e. coli ribosome using large-scale analysis of cryo-em images implicates the p-site loop as a dynamic arm interacting with and moving in relation to trnas passing across the p-site ( ) . although these studies were performed at resolutions of - Å , leaving considerable ambiguity regarding the precise residues involved, they clearly reveal highly dynamic interactions between the p-site loop and both p-site, and e-site trnas. although death is not a phenotype per se, the inviable mutants are informative nonetheless in so far as they demonstrate that the amino acids f gir are absolutely required for viability. while f is universally conserved, it does not appear to be essential on its own for viability, as witnessed in the mild phenotypes of the f a mutant. similarly, all single amino acid changes explored here resulted in viable cells, suggesting a certain degree of biochemical/biophysical redundancy within this essential loop. in support of this notion, the strongest growth phenotypes observed across a range of temperatures and small molecule translational inhibitors were concentrated in the multiple alanine substitutions, i.e. - a and - a, thus directing the bulk of the biochemical and structural analyses to these two mutants. analysis of the results of the assays performed on the viable multiple alanine substitution mutants (summarized in table ) provoke the hypothesis that the l p-site loop may dynamically function to help the ribosome sense the occupancy status of the large ribosomal subunit p-site. this is modeled in figure . when the p-site is unoccupied, the p-site loop can extend into this space, moving away from the terminal loop of h . upon occupation of the p-site, the peptidyl-trna t-loop displaces the l p-site loop, causing its retraction into h . by this model, the rrna shape analyses depicting increased protection of helix by the - a mutant show that this mutant drives the l p-site loop equilibrium toward the 'retracted' state. conversely, increased deprotection of helix in the - a mutant suggests that this more mimics the p-site unoccupied state, i.e. the 'extended' p-site loop state. this analysis directly explains the p-site binding data. retraction of the p-site loop from the p-site results in - a ribosomes having higher intrinsic affinity for this substrate while extension of this structure into the p-site creates a steric clash with the peptidyl-trna t-loop, resulting in decreased affinity for this substrate. that neither mutant conferred optimal peptidyl-trna p-site occupancy may account for their hypersensitivity to sparsomycin, especially for - a in which the p-site loop is already competing with the trna for the p-site. mutants - a, - a and - Á appear to disrupt the normal function of the p-site loop to a lethal level. in addition, the observation that trna binding to the p-site results in deprotection of c implicates h itself as a structurally dynamic unit. the functional consequences of this are not clear, although it is tempting to speculate that this conformational change may play a role in the structural rearrangements of the b b and b c bridges between the pre-and post-translocational states. the lack of rrna structural changes in the a-site or in the decoding center suggest that the biochemical and phenotypic effects observed are indirectly due to the changes described above. the reciprocal effects between ac-aa-trna binding with the p-site and aa-trna interactions with the a-site are intriguing. in the aa-trna binding reactions, the ribosomal p-sites were occupied with daeacylated trna. we suggest that in the - a mutant, the p-site ligand is more 'locked' into a suboptimal conformation, which in turn feeds back to the a-site, resulting in decreased affinity for its ligand. conversely, the lessened ability of - a mutant ribosomes to lock p-site ligand in a suboptimal conformation may account for the increased affinity of these ribosomes for a-site ligand. anisomycin resistance by both mutants also followed the reciprocal p-site/a-site pattern, i.e. both mutants were sparsomycin hypersensitive. paromomycin interacts with the decoding center in the small subunit, where it promotes misreading of near-cognate codons in the a-site by stabilizing codon-anticodon interactions ( ) . this sensitivity may be attributable to an observed increase in missense incorporation of a near cognate arginine (aga) over that of the sense serine codon (agc) in mutant - a. intriguingly, - a had wild-type levels of missense incorporation suggesting that its sensitivity to paromomycin was indirect. the reciprocal anisomycin/paromomycin phenotypes of the l mutants demonstrate the effects of this protein on a-site ligand based ribosomal functions over very long distances. similar phenotypic patterns were previously observed with mutants of other large subunit components ( , ) . the observed effects on - prf are consistent with a recent kinetic analysis demonstrating that aa-trna slippage is the most highly weighted parameter in determining the rate at which this process occurs (liao,p.y. et al., submitted for publication). here, increased affinity for aa-trna by the large subunit suggests that the - a ribosomes stabilize the frameshifted (i.e. near-cognate) trnas, reducing their ability to be proofread, thus promoting increased rates of À prf. this is consistent with the observed increased rates of missense decoding in this mutant. conversely, post-slippage a-site trnas are even less stable in the - a mutants, leading these to be more efficiently proofread, and thus promoting decreased À prf efficiency. in both cases, altering À prf from the optimum 'golden mean' precludes these cells from maintaining the yeast killer virus ( , ) . programmed + frameshifting is completely dependent on peptidyl-trna slippage. increased + prf in the - a mutant is consistent with decreased affinity for this substrate. the failure to observe decreased + prf in the - a mutant, despite its increased affinity for peptidyl-trna, is not entirely clear, although this may be due to the inability of these ribosomes to achieve a threshold beyond which + prf effects can be observed. the changes in rrna stability observed in the terminal loop of helix and in es are intriguing. chemical protection experiments revealed the terminal loop of helix is involved in a kissing loop interaction with the terminal loop of helix , and this interaction is apparent in the x-ray crystal and cryo-em structures ( , ) . increased lability at a and a was trna h l s p-site h l s c c figure . model: the p-site loop acts as a sensor of the occupancy status of the p-site. (left) when the large subunit p-site is unoccupied by trna, the l p-site loop is able to extend into this space leaving the distal loop of h partially deprotected from chemical attack. this conformation is favored by the - a mutant of l b. (right panel) occupation of the p-site by peptidyl-trna displaces the l p-site loop, causing it to tightly retract from the p-site and interact with h , resulting in increased protection of the h terminal loop from chemical attack. h likely moves toward the p-site loop slightly, increasing the exposure of c to the surrounding solvent. this conformation is favored by the l b - a mutant. previously observed in the y c mutant of ribosomal protein l (homolog of e. coli l ) located at the base of the aa-trna accommodation corridor, and in the É c (e. coli u ) s rrna mutant located in the peptidyltransferase center ( , ) . the observation that mutations located in three very different and topologically distinct regions of the large subunit conferred similar structural effects suggest that this kissing loop interaction plays an important role in ribosome function. its location on the cytoplasmic face of the ribosome where deacylated trna leaves the molecule implies that the interaction between the terminal loops of helices and may be involved in gating this deacylated trna exit corridor open and closed. this is consistent with the model of allosteric coordination between the a-and e-sites ( , ) , which would indicate that the defects conferred by all of these mutants on aa-trna binding might impair this e-site gating function. the decreased lability of c and g in es is similarly intriguing, raising more questions than answers. no function is currently associated with this expansion segment, but recent cryo-em analysis shows it to be located on a solvent accessible surface of the large subunit ( ) . perhaps this site is also involved in a-site/ e-site coordination. alternatively, it may be a site for recognition of defective ribosomes by the nonfunctional ribosome decay apparatus. the complete atomic structure of the large ribosomal subunit at . a resolution structures of the bacterial ribosome at . a resolution structure of the s ribosome from saccharomyces cerevisiae-trna-ribosome and subunit-subunit interactions comprehensive molecular structure of the eukaryotic ribosome crystal structure of the ribosome at . Å resolution structures of the ribosome in intermediate states of ratcheting the process of mrna-trna translocation the conserved a-site finger of the s rrna: just one of the intersubunit bridges or a part of the allosteric communication pathway? locking and unlocking of ribosomal motions the roles of ribosomal proteins in the structure assembly, and evolution of the large ribosomal subunit s rrna: structure and function from head to toe domain movements of elongation factor eef and the eukaryotic s ribosome facilitate trna translocation the primary structure of the gene encoding yeast ribosomal protein l depletion of saccharomyces cerevisiae ribosomal protein l causes a decrease in s ribosomal subunits and formation of half-mer polyribosomes assembly of s ribosomal subunits is perturbed in temperature-sensitive yeast mutants defective in ribosomal protein l developmental regulation of ribosomal protein l genes in arabidopsis thaliana essential role of ribosomal protein l in mediating growth inhibition-induced p activation ribosomal protein l and l mutations are associated with cleft palate and abnormal thumbs in diamond-blackfan anemia patients ribosomal frameshifting efficiency and gag/gag-pol ratio are critical for yeast m double-stranded rna virus propagation an in vivo dual-luciferase assay system for studying translational recoding in the yeast saccharomyces cerevisiae factors affecting nuclear export of the s ribosomal subunit in vivo a system of shuttle vectors and yeast host strains designed for efficient manipulation of dna in saccharomyces cerevisiae methods in yeast genetics differentiating between near-and non-cognate codons in saccharomyces cerevisiae systematic analysis of bicistronic reporter assay data enhanced purity, activity and structural integrity of yeast ribosomes purified using a general chromatographic method gcd , a translational repressor of the gcn gene, has a general function in the initiation of protein synthesis in saccharomyces cerevisiae experimental prerequisites for determination of trna binding to ribosomes from escherichia coli yeast ribosomal protein l affects the kinetics of protein synthesis and ribosomal protein l improves translational accuracy, while mutants lacking both remain viable the pymol molecular graphics system functional insights from the structure of the s ribosomal subunit and its interactions with antibiotics inhibitors of protein biosynthesis. ii. mode of action of anisomycin structures of five antibiotics bound at the peptidyl transferase center of the large ribosomal subunit structural basis for the interaction of antibiotics with the peptidyl transferase centre in eubacteria double-stranded rna viruses of saccharomyces cerevisiae a - ribosomal frameshift in a double-stranded rna virus forms a gag-pol fusion protein kre p, the plasma membrane receptor for the yeast k viral toxin an 'integrated model' of programmed ribosomal frameshifting and post-transcriptional surveillance evidence against a direct role for the upf proteins in frameshfiting or nonsense codon readthrough translocation mechanism of ribosomes rna structure analysis at single nucleotide resolution by selective -hydroxyl acylation and primer extension (shape) a fast-acting reagent for accurate analysis of rna secondary and tertiary structure by shape chemistry selective -hydroxyl acylation analyzed by primer extension (shape): quantitative rna structure analysis at single nucleotide resolution ribosome dynamics and trna movement by time-resolved electron cryomicroscopy selection of trna by the ribosome requires a transition from an open to a closed form structure/ function analysis of yeast ribosomal protein l an arc of unpaired ''hinge bases'' facilitates information exchange among functional centers of the ribosome achieving a golden mean: mechanisms by which coronaviruses ensure synthesis of the correct stoichiometric ratios of viral proteins evolutionary relationships amongst archaebacteria. a comparative study of s ribosomal rnas of a sulphur-dependent extreme thermophile, an extreme halophile and a thermophilic methanogen yeast ribosomal protein l helps coordinate trna movement through the large subunit rrna mutants in the yeast peptidyltransferase center reveal allosteric information networks and mechanisms of drug resistance features of s mammalian ribosome and its subunits deacylated trna is released from the e site upon a site occupation but before gtp is hydrolyzed by ef-tu we would like to thank dr. rasa rakauskaite _ for assistance training, offering technical support and advice above and beyond the call of duty. further thanks as well to dr. arturas meskauskas, dr. karen jack, hamid-reza shahshahan, ashton trey belew, dr. jonathan leshin and the rest of our laboratory for help and support. we thank dr. pamela silver for providing us with strain psy . supplementary data are available at nar online.conflict of interest statement. none declared. key: cord- -sxh mq q authors: milne, d. j.; armstrong, m.; fisher, a.; flores, t.; pavey, c. r. title: structure and environmental relationships of insectivorous bat assemblages in tropical australian savannas date: - - journal: austral ecol doi: . /j. - . . .x sha: doc_id: cord_uid: sxh mq q abstract patterns in the composition of assemblages of microbat species sampled during the late dry season (the ‘build‐up’) in north australian savannas were assessed against a range of environmental factors as well as four a priori defined habitat types (riparian, escarpments, coastal and woodlands). distinct species assemblages were most strongly associated with topographic and climatic variables. there were also limited associations with vegetation structure, fire and local roost potential but no associations with insects or water availability. total species diversity at sample sites was associated with distance to rivers and rainfall. in general, species assemblages were not clearly defined and the number of significant environmental associations was relatively few. we compare these associations with those reported for bat assemblages elsewhere in australia. understanding of the diversity and evolutionary ecology of australia's mammal fauna has not been uniform across orders. in particular, most detailed tests of evolutionary hypotheses (e.g. johnson ; fisher et al . ) omit bats (order chiroptera). assessments of population trends and extinction proneness have also excluded chiroptera (e.g. johnson ) . this is a significant shortcoming as bats represent over % of australia's mammal species, many of which are endemic. although australian mammal diversity peaks in the tropical forests of eastern queensland including cape york peninsula, significant diversity also occurs in the savannas of north-western australia where species are known ). an assessment of the response of mammals within this region to environmental variables revealed that a single environmental gradient (of substrate and disturbance) described the distribution of all species, excluding bats . rock-inhabiting mammals are a significant component of this fauna, however, diversity of this assemblage decreases with decreasing outcrop size and increasing isolation. identified three other trends. first, that the mammal fauna of eucalypt open forest/woodland habitats of north-western australia is characterized by extensive distributions of its component species. sec-ond, that monsoon forests support a depauperate mammal fauna. last, that the mammal fauna of this region undergoes substantial latitudinal change associated with a steep north-south rainfall gradient. did not include systematic sampling of bats, preventing a rigorous examination of the response of the bat fauna to environmental measures. however, data from captures (mist netting, harp trapping, roost searches) indicated that most bat species were present across the environmental range sampled ). here we revisit the issue of the response of bats to environmental variables in the tropical savanna of the northern territory and north-west queensland using a more rigorous data set. our data collection incorporated the use of ultrasonic detectors to sample bats and geographic information system (gis) derived variables to represent environmental conditions. the study region supports a rich microbat fauna ( of australia's species, of australia's genera), including one endemic species ( taphozous kapalgensis ), and both of australia's monotypic genera ( rhinonicteris , macroderma ). we assessed environmental factors at two levels, first at the landscape scale, using data available from a gis, and second at a local scale where information was collected on the physical environment and food resource availability (insects) at individual sampling sites. we predicted that the high vagility of bats would result in species responding broadly to environmental variables. however, specific responses to a number of environmental variables were expected. in particular, we predicted that the distribution, composition and segregation of bat assemblages would respond to geographical patterns in annual rainfall, presence of rocky escarpments, water bodies and canopy cover. although a relationship with insect abundance and composition was examined, we predicted that this relationship would not be significant given the generalist feeding ecology of most insect-eating bats (fenton ) . the bat assemblages of tropical australian savannas are also compared with assemblages elsewhere in australia. specifically, we compared our results with community composition and environmental association studies in north queensland rainforest (crome & richards ) , mangroves in north-western western australia (mckenzie & muir ) , and open forest/ woodland in victoria (kutt ; lumsden & bennett ; herr ) , south-east new south wales (nsw) (law et al . ) and tasmania (taylor & o'neill ) . the study area, called the top end of australia, included the tropical savanna of the northern terri-tory and north-west queensland, north of ° s, but excluding offshore islands (fig. ) . across this area, maximum mean weekly temperature ranges between ° c and ° c and mean annual rainfall between mm and mm (houlder ; fig. ). rainfall is highly seasonal with almost all precipitation occurring from november to april. topographic relief is relatively low. the maximum elevation is m on the arnhem land plateau, with the main areas of topographic relief being the kakadu escarpment and the eastern edge of the kimberley region in the southwest of the study area. eucalypt woodlands and forests dominate % of the study area (fox et al . ) . other notable environments include monsoon rainforests and floodplains dominated by sedgelands and grasslands. on average, over half ( %) of the top end is burnt every year (a. edwards, pers. comm. ) . a total of sampling sites were located across the top end (fig. ) . map of the study area showing the location of sampling sites and site labels as well as average annual rainfall isohyets (in millimetres). sites are symbolized according to bat assemblage (diamond = group , square = group , cross = group , triangle = group , circle = group ). sonal variation on species composition. each site was a circular plot of m radius. plots were primarily selected to cover a large geographical area and to sample four broad habitat types: . riparian -adjacent to perennial rivers, creeks or permanent waterholes ( sites). . escarpments -sandstone cliffs ( sites). . coastal -coastal and near coastal environments (excluding estuaries and mangroves) (eight sites). . woodlands -continuous areas of eucalypt woodlands or open forests not associated with the other habitats types ( sites). habitat types were chosen a priori and were based on information gleaned from species' distribution maps and descriptions of microbat habitat preferences (strahan ; churchill ) that suggested these habitats may contain distinctive species assemblages. two sites were usually sampled at a time on the same nights. with one exception, no two sites within a sampling pair sampled the same habitat type. distances between sampling pairs ranged between km and km (mean km). at each site we used a range of sampling techniques to maximize the likelihood of obtaining a full inventory of bat species (kuenzi & morrison ; murray et al . ; o'farrell & gannon ) . bats were sampled using two ( sites) or three ( sites) harp traps over two consecutive nights as well as one night of shot sampling for a -h period after dusk. harp traps were usually placed across 'flyways' (tracks, streams or other gaps within the vegetation where bats are more likely to be trapped) and were either positioned side by side or spaced between and m apart. we also conducted active searches of caves, road culverts and any other features potentially used as diurnal roosts by bats within m of the centre of the sampling site. in addition, bat calls were recorded at every site with ultrasonic bat-detectors (anabat ii, titley electronics, ballina, australia) using two methods. the first method involved placing a detector on the ground, elevated to approximately ° , and operated from dusk for a cumulative total of at least six recording hours over two consecutive nights (maximum h, mean h). this time period has been shown to sample % of species calls at a given site (milne et al . ) . detectors were connected to either an anabat ii delay switch with output recorded to -min cassette tape (sony chrome ux, tokyo, japan) via tape recorder (optimus ctr- d, fort worth, usa) ( sites) or an anabat v zcaim and computer (toshiba portégé ct or toshiba tecra ct d, tokyo, japan) running anabat software in monitor mode ( sites). there are no differences in the species detected between these two recording techniques (milne et al . ) . for the second method, an anabat detector was held in the hand and manually activated on detection of a bat-call and actively pointed in the direction of the call. calls were recorded via tape-recorder and cassette tape, for h after dusk for one night. all recorded calls were identified according to milne ( ) . at several sites ( ), shot sampling was not permitted. instead we trapped bats using mist-nets at these sites. it is likely therefore, that some 'high-flying' bat species that are readily detected using shot sampling, may not have been trapped at these sites. however, we expect this will have a negligible effect on our results as shot sampling at all other sites, used in conjunction with anabat detectors, enabled us to collect an extensive reference call library for 'high-flying' bat species for the entire study area (milne ) . anabat detectors were systematically used at all sites and will normally detect 'high-flying' species that are not readily trapped (o'farrell & gannon ) . we collated environmental data for each site from field habitat measurements, analysis of spatial data and insect sampling. at the centre of each site we measured tree basal area, canopy cover and stem count, m either side of a -m transect ( . ha) in an area of undisturbed vegetation usually adjacent and parallel to flyways where harp traps were set. on escarpment sites, the transects either traversed the escarpment or were situated at the base of the escarpment. basal area and stem counts were derived by measuring diameter at breast height (d.b.h.) of every tree along the transect, whereas canopy cover was measured using a spherical densiometer at m, m and m along the transect. for the entire site ( . ha), we measured slope, maximum canopy height, crown cover, rock cover, distance to water and local roost potential. crown cover in three height classes ( - m, - m and > m) was estimated using crown separation ratios (mcdonald et al . -mostly large trees and rock outcrop with large cracks and holes we chose to assess whole trees and rock outcrops rather than count individual hollows because small microbats ( < g) can roost in hollows equivalent to their own body diameter (pers. obs. ). entrances to these hollows are very small and would regularly be overlooked if we attempted to count hollows directly. large trees have been shown to contain more tree hollows than smaller trees (whitford ) and are preferred roost sites for many bat species (lunney et al . ; herr & klomp ; law & anderson ; lumsden et al . ) . several variables were derived using gis from a s ( c . m) digital elevation model (dem, provided by the department of defence) including elevation, ruggedness index (the range in cell values of the dem within a × cell neighbourhood), and distance to and m 'escarpments' (defined here as any adjacent dem cells having an altitude difference of or m). climate variables (annual mean temperature, minimum monthly temperature and annual rainfall) were derived using bioclim (houlder ) . other gis data included fire frequency (number of years in which the site was burnt over the preceding years) and years since last fire (data sets provided by the bushfires council of the northern territory), distance to perennial rivers, and ndvi (normalized difference vegetation index, which is a measure of vegetation 'greenness' derived from satellite imagery) and projective foliage cover (meakin et al . ) . at each site we trapped flying nocturnal insects for one night concurrently with bat sampling. the insect trap was constructed from a white cotton sheet ( . m × . m), suspended off the ground by strings tied to the corners to form a funnel, one end higher than the other. at the bottom of the funnel a hole was cut in the sheet and a plastic jar ( mm diameter × mm depth) partially filled with % ethanol was attached to hang underneath. a -v fluorescent light ('col-light' brand col-lite, maleny, australia) was hung from the higher end of the sheet to attract insects. the trap was positioned approximately m from the anabat detector so as not to disturb bats from natural flight habits in the vicinity of the detector, and left unattended for the entire night. insects that fell into the jar were collected the following morning. in the laboratory, insect samples were filtered through a -mm sieve to remove the smallest insects (mostly < mm in length) and then identified to order and assigned to four size (head-body length) classes: < mm, - mm, - mm and > mm. the choice of size classes was based on the range of body sizes found to be prey items of bats in tasmania (o'neill & taylor ) . analysis of bat communities was based on species presence-absence at each site derived from the combination of all sampling methods. anabat calls for the following combinations cannot be reliably separated in the top end: (i) chalinolobus nigrogriseus , scotorepens greyii and scotorepens sanborni ; (ii) miniopterus schreibersii and pipistrellus westralis ; and (iii) nyctophilus arnhemensis , nyctophilus bifax and nyctophilus geoffroyi (milne ) . anabat call sequences that were attributed to these species combinations were therefore excluded from the analysis, although species within these combinations were included if identified using one of the physical sampling methods. the one exception was s. greyii and s. sanborni which cannot be readily separated in the field (churchill ) and were treated here as a single species although in some areas their distributions are allopatric (mckenzie & muir ) . species assemblages were assessed using patn software (belbin ) . similarities in species composition between sites were calculated using the bray-curtis association measure. cluster analysis (unweighted pair group mean) was used to define assemblages (groups of sites) following visual inspection of the dendrogram. anosim (clarke & green ) was used to test whether bat species composition differed significantly between the defined assemblages as well as the four a priori habitat types. the relationship between sites was also portrayed by ordination (multidimensional scaling) of sites by their bat species composition. in both analyses, only sites with at least three species were included. all environmental variables (table ) were continuous or rank ordered. variables were initially compared using the spearman rank correlation test. where pairs of variables had a correlation coefficient greater than . , one of the pair was excluded from further analysis. the mean of each environmental variable was calculated for each group of sites derived from the cluster analysis and the significance of differences between bat assemblage groups was tested using kruskal-wallis anova . the relationship between environmental variables and the arrangement of sites in the ordination space was also tested using vector fitting (kantvilas & minchin ) . finally, generalized linear modelling (glm; crawley ) was used to develop a predictive habitat model for total site species richness. a poisson error distribution and log link function was used and a backward stepwise procedure was adopted to generate the minimum adequate model with only those variables having a significant correlation in the vector fitting included in the model development. a total of microbat species were identified from the sites, representing over % of the species recorded from north-western australia . two species known to occur in the top end , macroderma gigas (ghost bat) and saccolaimus saccolaimus were not detected in this study. we identified five groups from the classification of all sites by their species composition (fig. ) . the initial classification divided the sites into four groups. we subdivided the largest of these groups into two and assigned two outlying sites to group based on the relative position of these sites in the ordination. anosim analysis confirmed that the groups differed significantly in composition ( r = . , p < . ) and that there was a significant difference between each pair of groups ( p < . or better). the occurrence of bat species within the derived groups and habitat types is summarized in table and the geographical distribution of sites (classified according to group) is shown in fig. . four species were ubiquitous throughout the groups and habitats ( chaerephon jobensis , pipistrellus adamsi, mormopterus loriae and saccolaimus flaviventris ) while three species were each detected at single sites only ( hipposideros diadema , hipposideros stenotis and m. schreibersii ). the distribution of sites in ordination space and the relationship with environmental vectors is shown in fig. . a total of environmental variables were significantly correlated with variation in species composition between sites (table ) . a summary of mean values for these variables for each group is provided in table . a description of species composition and the environmental characteristics for each group is provided below. species that were detected most often in this group include chalinolobus gouldii (present in all sites), c. jobensis, s. flaviventris, s. greyii/s. sanborni and p. adamsi. chalinolobus gouldii was strongly associated with this group (i.e. tended to occur in group more than the other four groups). group had the highest total species richness and mean site species richness of all groups. sites were characterized by high percent-age canopy cover, frequent burning and high annual rainfall and were located in the north and west of the top end. the greatest number of sites occurred in this group ( ) . species that were detected most often include vespadelus caurinus (present at all sites), s. flaviventris , taphozous georgianus and c. jobensis . v. caurinus was strongly associated with this group. sites were characterized by rugged, steep rocky slopes, high elevations and short distances to escarpments and rivers. minimum temperatures were cool and annual rainfall low. sites were widely distributed across the top end, except the coastal zone. species detected most often in this group include s. flaviventris, n. arnhemensis, myotis macropus and s. greyii/s. sanborni. n. arnhemensis was strongly associated with this group. group was not clearly associated with any of the environmental variables measured and occupied an intermediate value on most environmental gradients. however, this group displayed the highest mean values for minimum temperature. figures represent percentage of sites within each group or habitat in which each species was detected. differences in proportions were tested using χ statistic and differences in mean species richness were tested using kruskal-wallis anova (ns, not significant; *p < . ; **p < . ; ***p < . ). this group had equal fewest sites (five) and had the lowest total and mean site species richness. species detected most often include c. jobensis (present at all sites) s. flaviventris, m. loriae and c. nigrogriseus. there were no strong species associations, although c. nigrogriseus occurred at proportionately more sites in this group than any other. sites were characterized by lower mean annual temperatures, long distances to rivers and no rock cover. species detected most often include s. flaviventris, p. westralis (both present at all sites), p. adamsi, m. loriae and t. kapalgensis. p. westralis and t. kapalgensis were strongly associated with this group. group also had relatively few sites and low species richness, but was associated with the minima or maxima of several environmental variables including long distances to escarpments, flat terrain at low elevations with no rock, low local roost potential, high annual temperatures and low fire frequency. all five sites were located near the coast (fig. ) . there was a significant difference in species composition between habitat types (anosim, r = . , p < . ) as well as between all pairwise combinations of habitats except between 'woodland' and 'riverine' (r = . , p = . ). v. caurinus and t. georgianus were detected most often in 'escarpment' habitat. both of these species, as well as rhinonicteris aurantius and c. nigrogriseus, were absent from 'coastal' habitat. p. westralis was strongly associated with 'coastal' habitat and absent from both 'escarpment' and 'riparian' habitat. n. arnhemensis was also absent from 'escarpment' habitat. all habitats had similar total and site species richness, with slightly lower species richness in the 'coastal' habitat. the relationship between groups and habitats is summarized in table . the habitat type of each site was not independent of group classification (χ = . , p < . ). most of the 'escarpment' sites occurred in group (steep, rocky, rugged sites) with two further sites in group . 'coastal' sites mainly occurred in group (few environmental correlates) and group (flat, low elevation), whereas 'riparian' sites occurred across four of the groups and 'woodland' sites were evenly represented across all five groups. we found no significant associations between bat species assemblages and various measures of insect availability including total number of insects, total number of insect orders, total number of insects in various size classes, proportion of insects in various size classes or total number of insects in each order. values are the mean for sites in each group, h-values refer to the kruskal-wallis statistic (ns; not significant; *p < . ; **p < . ; ***p < . ). habitat modelling identified distance to perennial rivers and annual rainfall as the major predictors for site species richness (table ). the minimum adequate model was only moderately robust with % of the deviance captured. this suggests that there was considerable 'noise' in the data or that some important explanatory variables were not quantified. as predicted, the insectivorous bat fauna of northwestern australia responded broadly to most environmental variables. the main environmental feature associated with the distribution of microbat assemblages in the study area was topography (variation in elevation, slope, topographic ruggedness and distance to escarpments). not surprisingly, therefore, species considered to be obligate cave roosters (hipposideros ater, h. diadema, h. stenotis, m. macropus, m. schreibersii, r. aurantius, t. georgianus, v. caurinus, vespadelus finlaysoni) , mainly occurred in (but were not restricted to) groups and which were associated with high values for the topographic variables. although we expected a relationship between microbat assemblages and distance to escarpments, the significant effect of elevation was not predicted. elevation generally increased away from the coast and was auto-correlated with 'distance to coastlines', making it unclear which of these features was most important in influencing bat composition. the second factor influencing bat composition was climate, specifically annual rainfall (corrolated with maximum temperature and latitude), mean temperature and minimum temperature (corrolated with temperature range; refer table ). the influence of annual rainfall was expected given the relationship between rainfall and species composition exhibited by the entire mammal fauna of north-western australia . a similar pattern was shown by the vegetation (bowman et al. ) and birds (whitehead et al. ) of north-western australia. in contrast to the significant relationship identified between bat assemblages and mean climatic variables, there was no significant relationship between ambient temperature (measured at pm each sampling night) at each site and species composition. at the time of year that we sampled, temperature was unlikely to limit the number of bat species that were active. however, during the dry season, low inland temperatures may reduce insect activity, restrict bat activity to the earlier, warmer times of the night and/or induce some species to enter torpor. therefore, restricting our sampling to one period of the year may have affected our results but sampling at different times of the year would have required a much greater sampling effort. between years, there was no observable difference in general weather patterns during each sampling period, therefore inter-year variations were unlikely to have affected our results. there were significant associations between bat species assemblages and fire frequency (corrolated with time since fire). the effects of fire on landscapes in northern australia can depend on the number of times an area is burnt and on the time since last fire (assessed here), fire intensity, seasonal timing of fires and spatial extent of burning (dyer et al. ; andersen et al. ) . the link between fire and bat species composition is likely to be an indirect one. it is also possible that characteristics of the landscape such as fuel loads, geography and habitat type are actually the primary influence for species assemblages and fire frequency is a secondary consequence of these landscape characteristics. therefore, our results should be viewed with caution and further investigation into the effects of fire on bat species assemblages is required before conclusions can be drawn. we assessed several variables involving insect availability at each site. none of the variables showed any significant relationship with microbat assemblages. this suggests that, in the top end at least, available food resources do not influence the composition of bat communities. this conclusion was consistent with previous research on insect-eating bats that indicated most species capture prey opportunistically (fenton minimum adequate models and explanatory power (per cent of deviance captured) are shown. probability levels *p < . , **p < . . ). specific research on tasmanian bats also concluded that bat assemblages were generally opportunistic foragers (o'neill & taylor ) . four aspects of our sampling strategy may have influenced our analysis. first, we did not sample non-volant insects and other arthropods that are eaten by some bat species in the top end (e.g. spiders, c.r. pavey, unpubl. data ) . second, high flying insects that are preyed on by bats such as taphozous spp. were probably not attracted to our light trap. third, bats may only show a response to insects at certain times of the year. it is likely that at the time of sampling (late dry season), insects were abundant and food resources did not affect the activity of bats. fourth, insect sampling was limited to one night per site, which may not have been sufficient to provide an adequate representation of overall insect availability. therefore, we suggest that a combination of insect sampling methods should be used in future assessments of prey availability and bat assemblages, particularly when the diversity of bats is high. these methods should aim to sample volant and non-volant invertebrates. did our study adequately sample a cross section of the major environmental gradients in the top end? compared with much of australia, the environment of the top end is relatively uniform. landscape relief is low, woodlands dominate most of the landscape, temperature varies little throughout the year and the climatic gradients are gradual. therefore, environmental variation is relatively small, and fewer sampling sites should be required compared with areas with greater topographic, climatic and vegetative variation. however, there may have been two significant deficiencies in our sampling. first, the highly seasonal rainfall in the monsoon tropics results in starkly contrasting 'wet' and 'dry' seasons. from our study, we were unable to say how bat composition may vary seasonally and there are no data available to assess seasonal patterns. second, the chosen study area was huge ( km ) and most of the north-east of the top end (arnhem land) was unsampled. therefore, clearer patterns of bat assemblages may have emerged if we had sampled more comprehensively, both spatially and temporally. most sites ( ) were sampled in pairs and the minimum distance between any two sites was km (mean km). bats can travel long distances during the night. foraging distances for a selection of species range between km and km (herr & klomp ; law & anderson ; lumsden et al. ) , therefore some of our results were potentially autocorrelated due to the same bats being sampled at both sites within a pair. therefore, we assessed the similarity in site species composition using the bray-curtis index. this index was calculated by dividing the number of shared species between pairs of sites by the total number of species of both sites. the resulting value was plotted against the distance between each pair of sites (fig. ) . the scatter of points was highly variable, however, the slope of the regression line was shallow. this pattern indicated that the relative change in species composition as a result of geographical separation was small. one of the environments largely neglected during sampling was monsoon rainforest, although 'riverine' sites did sample components of monsoon rainforest environments. we considered this had little effect on our results because monsoon rainforests occupy just . % of the landscape (based on mapping by fox et al. ) and usually occur in patches less than ha (russell-smith ) . in addition, menkhorst and woinarski ( ) found no bat species that were tightly associated with monsoon rainforests in the top end and these forests support a depauperate mammal fauna in general . some of the environmental variables that we found to be significantly correlated with bat assemblages in the top end differed from those related to bat community variation in other areas of australia. waterbodies have been found to support high species diversity and some species are strictly associated with them (law et al. ; south-west slopes of nsw.; young & ford ; central western queensland) . in the top end, glm analysis suggested bat species richness increases with decreasing distance to perennial rivers. however, species richness was not exceptionally high at our 'riparian' sites. further, group , which was on average closest to rivers, did not have the highest species diversity. also, we found no significant difference in species assemblages between 'riverine' and 'wood- fig. . similarity in bat species composition between pairs of sites (using the bray curtis similarity index, refer text) plotted against the distance between each pair. graph also displays the fitted linear regression line. land' habitats and there was no relationship with distance to available surface water. given that sampling was carried out during the driest time of the year (late dry season september-november) the likelihood of detecting significant associations with waterbodies was maximized. a relationship between vegetation structural complexity and microbat diversity has been established in studies in western australia (mckenzie & muir ) and nsw (law et al. ) . by contrast, we found significant correlations of bat species diversity with canopy cover but no associations with structural complexity. compared with the vegetation in the areas sampled by mckenzie and muir ( ) and law et al. ( ) , the vegetation of the top end is usually shorter and contains fewer understorey layers (d. lewis pers. comm. ). this limits the degree of vegetation structural complexity in the top end that likely accounts for the lack of correlation between structural complexity and bat communities. although we identified significant differences between the species assemblages within classification groups and habitat types, the assemblages were not clearly defined. most species occurred in more than one group and some were present in all groups. in addition, there were no associations between insect variables and bat assemblages and relatively few associations with environmental variables. this pattern is not restricted to microbats. birds, reptiles and nonvolant mammals also exhibit 'loose' patterns of species composition (woinarski & fisher ; woinarski et al. ) and limited associations with particular environments and environmental gradients woinarski et al. ) in the top end. woinarski et al. ( ) suggested this trend was a consequence of the homogeneity of eucalypt woodlands and forests that dominate the top end landscape. this relatively uniform environment militates against highly specialized and habitat-specific faunas. however, there were exceptions. specifically, some microbat species had a clear association with rugged rocky areas, particularly escarpments and adjacent areas. these areas provided a complex mix of habitats that contained foraging and roosting sites suitable for both cave and tree roosting species. this pattern extended to other vertebrate species as well. rocky escarpment regions in the top end support high species diversity as well as a number of endemic or habitat restricted species (woinarski & gambold ; . vegetation corridors beside rivers and surrounding areas (but not the waterbodies themselves) appeared to be important environments as they supported high bat species richness. bats are regularly characterized by the foraging strategy they employ within their immediate environment (mckenzie & rolfe ; neuweiler ; schnitzler & kalko ) . rivers are often associated with environments with tall dense vegetation. these areas do not appear to be of conservation significance because we did not observe high species richness at our 'riparian' sample sites. however, riverine environments usually have a distinct outer 'edge' and vegetation surrounding these areas is usually shorter and relatively open. we propose that these areas had greater species richness as they provide a diversity of environments for bats that employ different foraging strategies. we recommend that further research be conducted to examine the relationship between rivers and bats in the top end. our study did not take into account longer-term bat population dynamics. bats in the top end are poorly surveyed and, with few exceptions, surveys have been unstructured and unsystematic. therefore, attempting to identify and compare historical trends in bat populations is very difficult. given the top end environment is (currently) relatively unmodified, it could be assumed that mammal populations will remain stable and secure over the short to medium term. unfortunately, this is not the case. woinarski et al. ( ) described a case of decline in terrestrial small mammals in a conservation reserve in the top end that could not be confidently attributed to any clear environmental factor(s). further cases have also emerged (e.g. pardon et al. ; watson & woinarski ) . therefore, we recommend establishing longterm monitoring programs to track changes in bat populations so that changes may be quickly identified, assessed and appropriately managed. this is a highly challenging task that can only be achieved through a considerable commitment of time and resources. fire in tropical savannas: the kapalga experiment patn. pattern analysis package. csiro division of wildlife & ecology preliminary biogeographic analysis of the northern territory vascular flora australian bats statistical design and analysis for a 'biological effects' study glim for ecologists bats and gaps: microchiropteran community structure in a queensland rain forest the foraging behaviour and ecology of animal-eating bats the ecological basis of life history variation in marsupials the vegetation of the australian tropical savannas. environmental protection agency aspects of the ecology of insectivorous forestdwelling bats (microchiroptera) in the western slopes of the australian alps preliminary investigation of roosting habitat preferences of the large forest bat vespadelus darlingtoni (chiroptera, vespertilionidae) anuclim version . . centre for resource and environmental studies rarity in the tropics: latitudinal gradients in distribution and abundance in australian mammals determinants of loss of mammal species during the late quaternary 'megafauna' extinctions: life history and ecology, but not body size an analysis of epiphytic lichen communities in tasmanian cool temperate rainforest detection of bats by mist-nets and ultrasonic sensors activity and stratification of microchiropteran bat communities in thinned, unthinned and old lowland regrowth forest, east gippsland roost preferences and foraging ranges of the eastern forest bat vespadelus pumilus under two disturbance histories in northern new south wales a bat survey in state forests on the south-west slopes region of new south wales with suggestions of improvements for future surveys bat communities in a fragmented forest landscape on the south-west slopes of new south whales bats of a semi-arid environment in south-eastern australia: biogeography, ecology and conservation location of roots of the lesser long-eared bat nyctophilus geoffroyi and gould's wattled bat chalinolobus gouldii in fragmented landscape in south-eastern australia roost selection by gould's long-eared bat, nyctophilus gouldi, tomes (chiroptera: vespertilionidae) in logged forest on the south coast of australian soil and land survey field handbook bats of the southern carnarvon basin structure of bat guilds in the kimberley mangroves nor-for. mapping the forest cover of the nt. document field methodology and fpc indexing. department of lands planning and environment distribution of mammals in monsoon rainforests of the northern territory key to the bat calls of the top end of the northern territory a comparison of three survey methods for collecting bat echolocation calls and species accumulation rates from nightly anabat recordings surveying bat communities: a comparison between mist nets and the anabat ii bat detector system foraging ecology and audition in echolocating bats a comparison of acoustic versus capture techniques for the inventory of bats feeding ecology of tasmanian bat assemblages determinants of survival for the northern brown bandicoot under a landscape-scale fire experiment classification, species richness, and environmental relations of monsoon rain forest in northern australia how echolocating bats search and find food the mammals of australia. reed books composition of the bat (chiroptera: vespertilionidae) communities in tasmanian forests vertebrate monitoring and re-sampling in kakadu national park biogeographic patterns, environmental correlates and conservation of avifauna in the northern territory hollows in jarrah (eucalyptus marginata) and marri (corymbia calophilla) trees. i. hollow sizes, tree attributes and ages gradient analysis of a tropical herpetofauna: distribution patterns of terrestrial reptiles and amphibians in stage iii of kakadu national park wildlife of lancewood, acacia shirleyi thickets and woodlands in northern australia. . variation in vertebrate species composition across the environmental range occupied by lancewood vegetation in the northern territory gradient analysis of the distribution of mammals in stage iii of kakadu national park, with a review of the distribution patterns of mammals across north-western australia distribution patterns of vertebrates in relation to an extensive rainfall gradient and variation in soil texture in the tropical savannas of the northern territory bird distribution in riparian vegetation in the extensive natural landscape of australia's tropical savanna: a broad-scale survey and analysis of a distributional data base changes in mammal populations in relatively intact landscapes of kakadu national park bat fauna of a semi-arid environment in central western queensland our thanks go to bryan baker, chaz delacoeur, chris grant, david holland, marieke lettink, tony mitchell, steve owen, terry reardon, nathalia velez and felicity watt for their help with field work. greg connors, andrew edwards and peter brocklehurst assisted with various aspects of the analysis. john sheppard tracked down many of the references. thanks to irene rainey and peta jones who did a wonderful job with data entry. many parks and wildlife rangers and landholders assisted in planning surveys and locating suitable sampling sites. john woinarski, chris johnson, brad law, michael bull and an anonymous referee provided valuable comments on the manuscript. d.m. also thanks john woinarski for his support throughout this project. the tropical savannas crc provided funding for this project. animal sampling procedures were approved by the n.t. animal ethics committee (ref: a ). key: cord- - q k k authors: gu, xiaoqiong; tay, qi xiang martin; te, shu harn; saeidi, nazanin; goh, shin giek; kushmaro, ariel; thompson, janelle r.; gin, karina yew-hoong title: geospatial distribution of viromes in tropical freshwater ecosystems date: - - journal: water res doi: . /j.watres. . . sha: doc_id: cord_uid: q k k this study seeks to understand the general distribution of virome abundance and diversity in tropical freshwater ecosystems in singapore and the geospatial distribution of the virome under different landuse patterns. correlations between diversity, environmental parameters and land use patterns were analyzed and significant correlations were highlighted. overall, the majority ( . %) of the annotated virome belonged to bacteriophages. the percentage of caudovirales was higher in reservoirs whereas the percentages of dicistroviridae, microviridae and circoviridae were higher in tributaries. reservoirs showed a higher shannon-index virome diversity compared to upstream tributaries. land use (urbanized, agriculture and parkland areas) influenced the characteristics of the virome distribution pattern. dicistroviridae and microviridae were enriched in urbanized tributaries while mimiviridae, phycodnaviridae, siphoviridae and podoviridae were enriched in parkland reservoirs. several sequences closely related to the emerging zoonotic virus, cyclovirus, and the human-related virus (human picobirnavirus), were also detected. in addition, the relative abundance of pmmov (pepper mild mottle virus) sequences was significantly correlated with rt-qpcr measurements ( . < r < . , p < . ). this study shows that spatial factors (e.g., reservoirs/tributaries, land use) are the main drivers of the viral community structure in tropical freshwater ecosystems. viruses are the most abundant living entities on earth and all living entities are associated with at least one virus which can control microbial communities (ackermann, ) . viral metagenomics have been reported widely in water cycles, especially marine waters, to evaluate the role of viruses in microbial diversity and biogeochemical cycling (angly et al., ; breitbart et al., ; cassman et al., ; culley et al., ) . determining the factors in constructing the viral community is important for both understanding and manipulating ecosystems (dinsdale et al., ) . factors shaping the viral community in freshwater ecosystems can include temporal factors, geospatial factors, natural disturbances (e.g., typhoon) and human activities (djikeng et al., ; emerson et al., ; fancello et al., ; ge et al., ; hwang et al., ; l opez-bueno et al., ; skvortsov et al., ; tseng et al., ) . among all the factors, land use activities is a major factor in shaping waterborne viromes. land use changes are the primary drivers of the viral community and a range of associated infectious waterborne disease outbreaks (patz et al. (patz et al. , . in the water cycle, agriculture brings excess nutrients and agricultural chemicals to surface waters, causing oxygen depletion and increasing algal blooms (foley et al., ) . urbanization degrades water quality through surface runoff, and human pathogenic viruses have been detected more frequently in watersheds with dominant urban and agricultural land cover (corsi et al., ; lenaker et al., ) . furthermore, human expansion into wildlife habitats or construction of zoos and animal parks provide opportunities for humanwildlife interactions, thus, increasing the risk for the possible transmission of zoonotic viruses to human populations (patz et al., ) . these different anthropogenic activities harbor diverse and distinct viral hosts, including bacteria, plants, wild animals and humans. it is hypothesized that the environment surrounding these reservoirs may favor distinct viral predators and further change the viral community characterization. however, up till now, studies of land use impacts on the virome community in freshwater ecosystems are still limited as they mainly rely on traditional methodology (culture-based method or qpcr/rt-qpcr), which focuses on limited human virus targets without considering the whole picture of the viral community in the water environment (corsi et al., ; lenaker et al., ) . to date, there is no systematic report focusing on the geospatial distribution and diversity of viromes in natural surface waters and how they may be impacted by human activities and the effect of different land use. in addition, emerging viral pathogens of zoonotic origin could also be discovered through viral metagenomics, for which information is very limited. singapore is a highly urbanized island located in the tropics with an area of . km , and a population of . million in (department of statistics singapore, ). water is a scarce resource in this country and a total of reservoirs are used to collect rainwater and surface waters for potable water supplies. increasingly, selected reservoirs are being used as focal points for sporting events (kayaking and dragon boating) and recreational activities so that the public can enjoy and appreciate water resources. as such, good water quality is needed to protect recreational users of these water bodies. previous studies have detected waterborne viral pathogens (e.g., norovirus gi/gii, adenovirus, rotavirus and astrovirus) in singapore water bodies using qpcr/rt-qpcr (aw and gin, ; aw et al., ; rezaeinejad et al., ) . thus, continued surveillance of these viral targets as well as a broader range of viral targets expanded to the virome community coupled with land use information, could provide important data and new dimensions for managing the safety of water resources. for these reasons, understanding the geospatial distribution of viromes is needed. in this study, seven tropical reservoirs with diverse upstream land use functions (i.e., urbanized, agricultural and parkland areas) were examined. the viral community structure and virome populations specific to each of these environments were systematically investigated together with the characteristics of the watershed. by conducting a complete virome analysis of these freshwater ecosystems (especially viral pathogens and fecal viral indicators), a comprehensive picture of the links between the virome community structure and specific land use activities could be elucidated and thus, used to conduct risk assessment of associated waterborne disease. this information would be important for environmental management at a macroscopic level to protect public health. thus, the objectives of this study were to: ) investigate the overall virome distribution and diversity in diverse freshwater ecosystems (reservoirs/tributaries) in a tropical environment, ) compare the virome community based on the different land use patterns, ) assess the extent of human-related pathogenic viruses in surface waters, especially emerging zoonotic and human-related viruses, which may have been undetected before. a total of seven reservoirs and three catchments were sampled in singapore during january (northeast monsoon) and april (inter-monsoon period), (table s ). in total, sampling points were surveyed, comprising of locations in the reservoirs and locations in the tributaries. only reservoirs , and had corresponding tributary sampling points. the study sites were divided into categories based on their geospatial characteristics: urbanized, agricultural and parkland areas (table ) . apart from storing water and preventing flood control, some of the reservoirs (i.e., r - , e ) also catered for recreational activities such as kayaking, dragon boating and water skiing. all the upstream tributary sites were designated as non-recreational areas. physical-chemical parameters, including temperature, ph, turbidity, conductivity, salinity, total dissolved solids (tds) and dissolved oxygen (do) were measured on site using a hanna meter probe (hi multiparameter meter; hanna instruments). -hour rainfall data were obtained from the singapore historical daily records ( http://www.weather.gov.sg/climate-historical-daily/ ). . . . primary concentration, secondary concentration and viral nucleic acids extraction -l water samples were collected from each sampling location in three -l carboys. the raw-water sample was immediately transported to the lab and concentrated through a hollow fiber ultrafiltration unit with blocking and elution buffer (hemoflow fresenius hf s, germany) to a final volume of ml. the hollow fiber ultrafilter was purged with nanopure water for min and pre-treated with ml of blocking solution ( . g of napp in l of nanopure water) for min. after that, the water sample was recirculated until a final volume of approximately e ml was reached. an elution step was carried out by recirculating around ml of elution buffer ( . g of napp, ml of tween , and ml of antiform in l of nanopure water) for min. both retentate and eluent were combined to a final volume of ml during primary concentration. ml of primary concentrate was further processed to enrich viral particles in a secondary concentration step through polyethylene glycol (peg) precipitation ( % peg (w/v) and . m nacl) after ph adjustment (ph ¼ . ) (jaykus et al., ) . after incubation of the mixture at c for h followed by , g for min, the pellet was dissolved in ml of phosphate buffer saline (pbs, ph . ) with an equal volume of chloroform. after centrifugation at g for min, the supernatant was filtered through a . mm sterile syringe and further concentrated to a final volume of ml using a kda ultracentrifugal filter device (merck millipore, ireland). after primary and secondary concentration, ml of viral nucleic acids (dna and rna) was extracted using the qiaamp viral rna mini kit (qiagen, hilden, germany) and then stored at À c (saeidi et al., ) . in order to quantify the human pathogens in reservoirs and their tributaries using qpcr/rt-qpcr, viral targets were performed to include: ( ) four genotypes of male-specific coliphages (frna gi-frna giv); ( ) ten human viral pathogens and ( ) one plant viral pathogen/microbial indicator (i.e., pmmov) ( table s ). the majority of targets belonged to group iv ssrna (þ) except for adenovirus (group i dsdna) and rotavirus (group iii dsrna). the details of qpcr/rt-qpcr primers and probes are listed in table s . the extracted viral nucleic acids were reverse transcribed using improm-ii reverse transcription system for detecting rna viruses following manufacturer's instructions (promega, usa). rt products were stored at À c for later analysis. qpcr/rt-qpcr was carried out in a steponeplus real-time pcr system (applied biosystems, usa) using faststart universal probe master (rox) (roche, germany) following miqe guidelines (bustin et al., ). the extracted nucleic acids were reverse transcribed and amplified to obtain a sufficient quantity of dna and cdna (wang et al., ) . to ensure there was no microbial contamination, the negative control was run in % agarose gel and ascertained that no dna gel band was present in the negative control lane. after purification of viral dna and cdna, samples were sent to scelse (singapore centre on environmental life sciences engineering). in total, libraries and one negative control library using illumina truseq nano dna library kit were constructed. the sequencing libraries with a corresponding insert size and adapters were prepared accordingly as previously described (ng et al., ) . the illumina's phix control library was used as standard. the libraries were then pooled and sequenced in one lane with equimolar concentrations on an illumina hiseq sequencer in rapid mode at a final concentration of pm and a read-length of bp pairedend (v sequencing reagents). the overview of the virome datasets is detailed in table s . the effect of the negative control is ignored, as explained in the supplementary information. meanwhile, s rrna gene amplicon sequencing was applied to the same batch of samples at sites e to characterize the bacteria community. a total volume of ml water samples from primary concentration was centrifuged at , g for min. after removing the supernatant, the pellet was transferred to a power-soil ® bead tube (mo bio laboratories, inc., carlsbad, ca) for dna extraction with a final volume of ml. genomic dna (gdna) was amplified for the marker gene targeting the s rrna variable regions v to v using the universal pyrotaq primer set ( wf and r) (rinke et al., ) . samples were sequenced in the university of illinois, research resources center -dna services facility and sequence preprocessing as described previously (te et al., ) . the biological data (subsampled otus) were square-root transformed to reduce the dominance of major otus before putting into primerv for pcoa and correlation analyses. the sequencing data were trimmed to remove adaptors, low quality reads, "primer b" sequences which were used for random amplification and phix reads following the bbtools instruction (version . ). reads were then de novo assembled into contigs by clc genomics (version . . ) through cross-assembly strategy. the assembly setting in clc genomics is: minimum contig length of bp and similarity fraction . . contigs were then uploaded into the metavir and virsorter pipeline. briefly, metavir pipeline will first extract open reading frames (orfs) from contigs through the metageneannotator. all predicted orfs were then compared to the national centre for biotechnology information (ncbi) refseqvirus protein database ( . . ) using a reference-dependent taxonomic analysis, with the threshold of bit score and an e-value cut-off of À . after getting the best blast hit affiliation of each predicted gene, annotation of each contig was made based on the lowest common ancestor (roux et al., ) . in addition, metavir pipeline will compute the genes affiliated to the pfam database for function analysis. virsorter identified category viral signal (pretty sure) and category viral signal (quite sure) from the assembled sequence and the viral signal included bacterial and archaeal viruses (roux et al., ) . in particular, human-related viruses were extracted from met-avir results and further examined to search against ncbi nonredundant nucleotide database (downloaded in june ) through blastn search for nucleotide similarity and query coverage (e-value < e- ). in order to quantify the relative abundance of each contig within different libraries, reads were remapped to contigs using novoalign (hercus, ) . this absolute matrix was used for quantifying the relative abundance of taxonomy assignment. since every contig has its weight in quantifying virus concentration, to standardize and quantify every contig's relative abundance between each library, rpkm (reads per kilobase of "transcript" per million mapped reads) was used after remapping trimmed reads to each contig across different libraries (mortazavi et al., ; reyes et al., ) . rpkm was calculated as followed: rpkm ¼ trimmed reads mapped to each contig per library the total trimmed reads mapped to this library ðmillionÞ*this contig length ðkbpÞ this matrix was equivalent to an otu table for pcoa analysis. the non-normalized matrix of reads mapping to each contig per sample (absolute reads matrix) was used and rarefied to , reads per sample (the smallest sample size in absolute reads matrix of taxonomy affiliated viruses was , ) to calculate a-diversity. "observed species" and "shannon diversity index" were calculated for each sample using macqiime (version . . ) with the iteration set at . multivariate analysis of viral community composition under different treatments (e.g., landuse, reservoir/tributaries, monsoon/ inter-monsoon season) was conducted using anosim in primerv and permanovaþ. the similarity of each pair of samples in terms of biotic characteristics was calculated using bray-curtis similarity. rpkm of the total contigs and of the virsorter category and phages were log plus one transformed before inputting into pri-merv for pcoa and correlation analyses. all the reference genomes of qpcr/rt-qpcr viral targets were downloaded from ncbi and accession numbers are detailed in table s . mapping was performed using the clc genomics workbench . . based on the default setting (mismatch gap ¼ , with the linear gap cost, length fraction: . , similarity fraction is based on the specific targets, at medium ( %) medium-high ( %) and high ( %, % or %) three levels) (table s ). the threshold of three levels could represent the nucleotide variation within groups of these viral targets (de graaf et al., ; doceul et al., ) . the proportion of mapping reads equals the number of mapping reads in each library divided by the total reads in the library. all geospatial analyses were performed using arcgis v. (esri, aylesbury, uk) and the percentage of different land use in different reservoirs and their tributaries are detailed in table . land-use shape files were obtained from the singapore land authority and pub (singapore's national water agency). four different layers of map were used for analysis: (i) catchment land-use shape files, (ii) river shape files, (iii) sub-catchment shape files and (iv) drain line maps. wgs_ _web_mercator_auxiliary_sphere was used as the projection coordinate system while gcs_wgs_ /svy singapore was used as the geographic coordinate system. population data was downloaded from the singapore statistics website (http://www.singstat.gov.sg/statistics/browse-by-theme/ geographic-distribution, basic demographic characteristic ) and the population density was calculated based on the total population within the planning area divided by the planning area which covered the drainage area for that sampling point. the shape file of master plan of planning boundary no sea was used as the reference map for the planning area classification (https://data. gov.sg/dataset/master-plan- -planning-area-boundary-no-sea, department of statistics singapore). all the statistical tests including one-way anova and spearman rank correlations were performed using ibm spss (ibm, portsmouth, uk). environmental parameters were log plus one transformed before correlating with a-diversity index as the raw data were right skewed. graphs were plotted using microsoft excel ( ) and originpro (originlab, northampton, ma). virome datasets were deposited in ncbi sequence read archive (sra) under accession no. srr - . s rrna sequences were deposited in ncbi sra under accession no. srr - . environmental parameters of the study sites (i.e., reservoirs and tributaries points) are illustrated in table s . the temperature remained relatively stable for all the sampling points, ranging from to . c with a mean value . c. ph ranged from . to . with a mean value . . other environmental parameters varied more, including conductivity ( . e . ms/cm, mean: . ms/cm), tds ( . e ppm, mean: ppm), do ( . e . ppm, mean: . ppm) and turbidity ( . e . fnu, mean: . fnu). the -hour rainfall and -day rainfall ranged from to . mm (mean: . mm) and e . mm (mean: . mm), respectively. the -hour rainfall and -day rainfall had higher mean values (i.e., . mm and . mm, respectively) during the northeast monsoon (jan) and lower mean values (i.e., . mm and . mm, respectively) during the inter-monsoon (apr). table summarizes the land use percentage and the population density for the reservoir catchments. to better evaluate land use impacts on the viral community, the sampling sites were classified into three categories: urbanized areas, agriculture areas and parkland areas. sites , and included approximately e % residential and urban land use, and as such can be regarded as urbanized areas. site had less than % of residential and urban areas, with about e % green and e % agricultural areas. site was the only sampling location with a reasonable percentage of agriculture area; therefore, site was considered as agricultural. site and site had similar land use percentage with more than % of the land covered with green areas; thus, it was considered as parkland. site included % of residential and urban categories. however, unlike sites e , there was no drain directly connected to the reservoir at this site and hence, the chance for possible viral contamination was relatively small. therefore, site was classified as parkland areas as well as site and site . the land use category clustering of the seven different sites illustrated the reasonability of our land use category classification (fig. s ). the population density was obtained from the singapore department of statistics. among the sample locations, sites , and had the highest population densities (> people/km ) while sites , and had the lowest (< people/km ). a total of . hundred million high-quality hiseq sequencing reads of samples were obtained from different reservoirs and their tributaries in singapore during the monsoon and inter-monsoon periods (table s ). these were assembled across different libraries and , contigs were generated with an average length of bp (n ¼ bp). on average, . % ± . % reads mapped back to the assembled contigs per library (min: . %, max . %). of the , contigs, , contigs were annotated using metavir pipeline. based on the absolute reads matrix (the number of reads mapped to each family in each sample) and the metavir annotation result, on average, . % was annotated and a total of families were identified (fig. ) . the majority ( . %) of these reads belonged to the caudovirales (myoviridae, siphoviridae and podoviridae). this was followed by reads for phycodnaviridae and mimiviridae (whose hosts belong to algae and amoeba) which were retrieved at percentages of . % and . %, respectively. apart from these major viral families, other families such as dicistrovidae, microviridae, circoviridae, iridoviridae and poxviridae were also present in our study. . % were unclassified viral sequences and more than % were unassigned (na) sequences. function analysis of the , viral contigs identified , genes and , ( . %) of the genes encoded hypothetical proteins or conserved hypothetical proteins. the most retrieved pfam annotations showed that phage terminase enzyme, phage portal and phage integrase were responsible for the phage function (table s ) . fig. a shows the taxonomic percentage of virome in reservoirs and their tributaries (sites , and ). fluctuation could be observed within tributaries at the same sampling site. in contrast, virome composition in reservoirs was similar across several sampling points in singapore and shows a more stable pattern than the tributaries (fig. b) . on average, caudovirales accounts for . % among all the annotated viruses in all the tributaries (sites , and ) with a proportion of . % and . % in jan and apr, respectively. for the reservoirs, caudovirales comprises an average of . % among all the annotated viruses with a proportion of . % and . % in jan and apr, respectively. note that the percentage of caudovirales was higher in all the reservoirs than in all the tributaries. in contrast, the percentage of dicistroviridae was higher in tributaries ( . %) than in reservoirs ( . %) in jan (tables s and s ). the percentage of microviridae and circoviridae in tributaries was approximately . % (reservoirs: . %) and . % ( . %), respectively. the shift in composition was observed as the percentage of caudovirales decreased while the percentage of dicistroviridae, microviridae and circoviridae subsequently increased, as locations changed from reservoirs to tributaries. viral communities in different reservoirs in terms of land use impact were compared and the majority of viral communities were largely conserved and stable at the family level with myoviridae, siphoviridae and podoviridae as the main family level, and small differences observed in dicistroviridae and other families (fig. b) . both the richness and diversity (observed species and shannon diversity) was calculated on the taxonomically affiliated viruses from the subsampled absolute reads matrix. the species richness of viral samples from reservoirs with tributaries comprising mostly agriculture and urbanized areas had a similar amount of species while those from parkland areas showed lower viral richness, which was reasonable as parkland areas would generally receive less viral contamination. it was also noted that the viral species richness from tributaries was also lower than their corresponding reservoirs. the average shannon diversity of reservoirs and tributaries was computed as . and . respectively and a one-way anova test showed that the shannon-diversity of tributaries was significantly smaller than those from reservoirs (p < . ). this indicated that tributaries were more likely to be dominated by selected species compared to the reservoirs (fig. s ) and this may be related to the viral host's presence (e.g., bacteria, archaea, algae, insects, plants, animals). future studies could explore this further. a spearman rank correlation was carried out between richness and diversity (observed species and shannon index calculated by macqiime), environmental parameters and the land use pattern ( table ). the correlation coefficient showed that the richness index of observed species was significantly positively correlated with rainfall and agriculture while the shannon index was negatively correlated with ph. when looking at the b-diversity of the microbial community (< . mm) structure, the distribution of rpkm (< . mm microbial fig. a and b ). this pattern was consistent with the bacterial community geospatial distribution ( fig. e and f) . mimiviridae, phycodnaviridae, podoviridae and siphoviridae were enriched in parkland areas (spearman r ¼ . e . to pco ) whereas dicistroviridae and microviridae were enriched in urbanized areas (spearman r ¼ À . to pco ) (fig. a) . acidobacteria, actinobacteria and verrucomicrobia were enriched in parkland areas (spearman r ¼ À . to . to pco ), cyanobacteria were enriched in agricultural areas (r ¼ À . to pco ) whereas proteobacteria were enriched in urbanized and agricultural areas (spearman r ¼ À . to pco ) (fig. e ), comparable to a recent study conducted in singapore showing that proteobacteria were enriched in horticultural and residential samples (nshimyimana et al., ) . the microbial community (< . mm) and virsorter category and phage community in the seven reservoirs both showed distinctly different distributions within reservoirs/tributaries and within different sites ( fig. c and d) . different sampling points of reservoirs at sites and did not show any difference. for sites , and , the microbial community shared a similar contig profile even though they did not share the same land use pattern (i.e., sites and : urbanized area; site : agriculture area). noticeably, urbanized area site showed a slightly different pattern when compared with the other three reservoirs, perhaps due to potential different viral host communities (site is near the construction sites). the microbial community in the tributaries for sites , and were much more variable spatially and temporally, compared to the reservoir microbial community. in our study, contigs were affiliated to human picobirnaviruses and contigs were affiliated to cycloviruses. they had a higher relative abundance in the urbanized and agricultural reservoirs and all the tributaries (fig. ) . possible viral contamination could be from urban drains including those from housing estates, sewer leakage and construction sites. although previous studies have detected enteric viruses in urban catchments in singapore (aw and gin, ; aw et al., ; liang et al., ; rezaeinejad et al., ) , our metagenomics sequencing was unable to find contigs assigned to the enteric viruses commonly detected in singapore surface waters (e.g., norovirus gi, norovirus gii, adenovirus, rotavirus). however, several fecal indicators including pmmov, frnagi (ms ) and frnagiii (qbeta), were detected in our samples. interestingly, in our study, the pmmov-affiliated contigs had a higher nucleotide similarity ( . %e . %) with the average query coverage of . % (table ), suggesting that pmmov is a highly conserved virus with a lower evolution rate in the surface water environment. this result is similar with a recent study, where pmmov detected in % of surface water samples shared e % nucleotide identity with pmmov reference genome (rosiles-gonz alez et al., ). . . correlation of metagenomics data with molecular assays for selected viral targets qpcr/rt-qpcr was conducted to detect and quantify the concentration of the common enteric and zoonotic viruses in the surveyed communities. viral targets ( viral pathogens, fþ malespecific coliphages and plant virus) were measured and the detection frequency was between and . % across all the samples (table s ). the viral pathogens detected in this study using qpcr/rt-qpcr included enteric viruses (adenovirus, astrovirus, rotavirus, norovirus gi, norovirus gii, enterovirus and aichi virus) and other zoonotic viruses (e.g. hepatitis e virus, saporo virus). reads mapping to human-infective viruses were rare and were not observed to assemble into contigs. high-quality reads from different libraries were aligned to the viral pathogen database (vipr/ird) and mapped to viral reference genomes in ncbi (table s ). few reads ( e reads per library) were assigned to the human-infective viruses norovirus and saporo virus using both vip pipeline and standalone pipeline (results not shown) (li et al., ) , and e reads per library were mapped to human viral pathogens reference genome at the e % nucleotide similarity level. no significant correlations were observed between qpcr measurements and read mapping for these viruses. these results suggest that metagenomics is less sensitive at detecting lowabundance viral pathogens against a background of a large proportion of bacteriophages. a total of targets were found in both the metagenomics (contigs) and qpcr data. of these, pmmov showed a good spearman rank correlation (p < . ) ( table ). spearman rank correlation was carried out between the pmmov molecular results and the corresponding metagenomics data in terms of contigs level and reads level (table ). in terms of contigs level, all pmmovaffiliated contigs showed a significant good correlation between molecular results and metagenomics data ( . < r < . , p < . ). in terms of reads level, out of the libraries, libraries ( . %) had reads mapping to the pmmov reference genome (nc_ . ) with % identity to the reference genome. based on all the samples, the spearman rank correlation coefficient was . (p < . ) ( table ) . this study is the first to correlate land use impact with freshwater water viromes in an urban tropical environment using illumina hiseq sequencing. it is noteworthy that metagenomics of dna viruses and rna viruses have been often considered separately and that studies of dna viruses are far more than rna virome analysis. this disparity is largely a result of technical challenges presented by rna metagenomics as rna is fragile and needs to be reverse transcribed into dna. thus up till now, the majority of the virome metagenomics papers studied either dna or rna (but not both). moreover, the amplification protocol reported in these earlier studies may have influenced the final outcome (l opez- bueno et al., ; rodriguez-brito et al., ) . for dna viruses, the genomiphi kit (ge healthcare) has been used for amplification through dna polymerase phi but it was discovered that it has a bias towards single-stranded circular dna viruses. for rna virome analysis, random priming mediated sequence independent single primer amplification (rp-sispa) was used in lake needwood, maryland (djikeng et al., ) . wang et al. ( ) developed a protocol capable of amplifying dna and rna viruses simultaneously. this was subsequently tested in reclaimed water, plasma samples, wastewater and ballast water (bibby and peccia, ; kim et al., ; rosario et al., ; wylie et al., ) . rna viruses account for more than % of the viral pathogens due to their adaptive abilities (mutations, recombinations or reassortments) to infect novel hosts easily (temmam et al., ) . in our study, . % of viruses are dna viruses whereas . % are rna viruses among annotated viral families. among all the viral pathogens detected by metagenomics, contigs affiliated to human picobirnaviruses are dna viruses whereas contigs affiliated to cyclovirus are rna viruses. the simultaneous application of combining both dna and rna metagenomics enabled the identification of a much larger numbers of viral sequences, especially for human-related viruses. this was the approach taken in our study here. in this geospatial metagenomics study, we identified virome abundance and diversity in different reservoirs and tributaries. due to insufficient viral genome sequencing information in the ncbi, . % of the virome genome had no significant hits and bacteriophages were the most abundant organisms in the matrix, which is in agreement with previous studies of surface water viromes (mohiuddin and schellhorn, ; skvortsov et al., ; tseng et al., ) . phages, considered to be the most abundant and diverse biological entities on earth, play an important role in a ms belongs to the group of frna gi (animal-specific indicator) and qbeta belongs to the group of frna giii (human-specific indicator). spearman rank correlation between qpcr (gc/ ml) and metagenomics data (rpkm) on pmmov (n ¼ ). shaping biological and geochemical processes at the global scale (díaz-muñoz and koskella, ) . they may also be responsible for human health in spite of the fact that they do not infect humans. this is because some phages can convey new properties of coding for toxin production to the host bacteria, thus converting harmless bacteria into pathogens (grabow, ) , for example, the shiga toxin-converting phages have been known to change the pathogenicity of e. coli o :h (muniesa and jofre, ) . tributaries were found to have a higher proportion of microviridae and dicistroviridae. the hosts of microviridae belonged to enterobacteriaceae and obligate parasitic bacteria (roux et al., ) . most of the enterobacteriaceae taxa strains are pathogenic, for example, virulent strains of e. coli and klebsiella pneumonia. enterobacteriaceae occurred more in tributaries, which may contribute to the distribution and abundance of microviridae in tributaries. the hosts of dicistroviridae are soil-inhabiting invertebrates (e.g., aphids and ants, which are common in tropical singapore). thus, the fact that tributaries harbor the viral hosts, indicate that the viruses may be released from catchments and be washed into the tributary periodically. the smaller spatial variation pattern of larger reservoirs compared with tributaries was not surprising, as reservoirs tend to be more resilient to urban and storm water runoff than their tributaries. in previous studies, freshwater microbial communities have been found to be resilient to natural disturbances (tseng et al., ) . activities on a-diversity and community structure when correlating a-diversity with environmental parameters and land use factors, it was found that ph had a significant correlation among all the environmental parameters (r ¼ À . , p < . ). ph is a key factor in determining virus infectivity where low ph (ph < ) has been found to significantly reduce phage survivability ( e %) (jurczak-kurek et al., ) , even though the ph range in this study is relatively small ( . e . ), it is likely that these differences could make a difference in viral a-diversity in our study. a moderate correlation with rainfall (r ¼ . , p < . ) was observed which could be due to heavy rainfall flushing terrestrial bacteria, viruses and nutrients to the reservoir (tseng et al., ) . precipitation was found to be the major factor affecting the microbial community in a subtropical reservoir in taiwan (tseng et al., ) . singapore is characterized by two main monsoon seasons: the northeast monsoon (decemberemarch) and southwest monsoon season (juneeseptember). -hour rainfall during the sampling times during the monsoon and inter-monsoon season showed a significant difference between january ( . mm) and april ( . mm) (one-way anova, f ¼ . , pvalue ¼ . ), which could have contributed to the characterization of the virome composition. based on land use category, observed species were also found to be significantly correlated with agriculture land use (r ¼ . , p < . ). this could be expected as intensive agriculture has been reported to bring nutrients and agricultural chemicals to the water cycle (foley et al., ) , which in turn, disturbs the microbial community indirectly (tseng et al., ) . agricultural intensification has been associated with pathogen emergence transmission between wildlife and domestic animal populations and human populations, which in turn, could increase the species of zoonotic viruses (pulliam et al., ) . jones et al. ( ) concluded that agricultural practices are one of the socioeconomic drivers in the spatial distribution of emerging infectious diseases. however, the non-significant correlation between diversity and other land use types does not necessarily suggest weak connections between the viral community and land use factor, since a-diversity indices (shannon diversity and observed species) are just one aspect of characterizing the viral community and does not necessarily represent a complete view. b-diversity analysis indicated that mimiviridae, phycodnaviridae, siphoviridae and podoviridae were enriched in reservoirs of parkland areas while dicistroviridae and microviridae were enriched in tributaries of urban areas (fig. a) . siphoviridae, podoviridae and microviridae belong to bacteriophage, which may be associated with parkland and urban enriched bacteria (i.e., actinobacteria, verrucomicrobia, chloroflexi, acidobacteria and proteobacteria) in our study (fig. e) . overall, viral diversity in the surveyed reservoirs ( < h < ) was higher than that reported in a subtropical reservoir in taiwan ( . < h < . ) (tseng et al., ) . these differences could be due to the geographical (latitudinal gradient) differences in microbial diversity, as low-latitude tropical ecosystems tend to lead to higher biological diversity (chown and convey, ; fuhrman et al., ) . kim et al. ( ) reported that viruses had higher richness near the equator and lower richness at higher latitude, similar to human pathogen species (guernier et al., ) . however, these comparisons between the diversity in freshwater virome studies need further confirmation due to different indices used (chao , simpson index, shannon index, and observed species) and diverse calculation methods applied (e.g., phaccs, qiime, catchall). by using phaccs based on the contig spectra generated by circonspect, tseng et al. ( ) derived a shannon diversity (h) range from . to and from . k to . k viral genotypes in one subtropical freshwater reservoir, which was lower than the ocean's virome shannon diversity in british columbia, the gulf of mexico and the sargasso sea (h of . , . and . , respectively) (angly et al., ) . the difference in methods used in published papers makes it difficult to draw comparable conclusions. a standard pipeline in deriving the viral diversity in ecology is required in order to make comparisons of viral diversity across diverse aquatic ecosystems from different studies. the graphs of pcoa analysis in both reservoirs and tributaries indicated that the land use pattern around the surveyed areas had an important impact on characterizing the virome community. even though the viral communities at the family level were conserved across different reservoirs, the pcoa plots suggested a dynamic shift in differences between contig levels of the viral community in terms of land use. the geospatial distribution patterns in the surface water environment could have resulted from both direct and indirect factors. on the one hand, the different land use patterns could also introduce foreign viral contamination into water bodies directly through urban/agriculture runoff, precipitation, leaking sewers, etc., while the viral community itself can be indirectly changed and characterized by the relationship to their hosts in specific environments. the runoff from the surrounding areas can also bring specific bacteria, or other vectors and nutrients into the reservoirs and the tributaries (tseng et al., ) . even though previous studies have shown the correlation between land use and water-borne viral pathogens (corsi et al., ; lenaker et al., ) , ours is the first study correlating land use cover with the whole viral community, thus overcoming the limitations of investigating specific viruses and providing comprehensive information on the community structure with the relationship of land use cover. viral pathogens could be introduced into reservoirs and tributaries through urban and storm water runoff. although enteric viruses (e.g., adenovirus, norovirus, rotavirus, enterovirus, etc.) could not be detected with viral metagenomics in our study, past studies have shown that they are prevalent in tributaries using qpcr and thus, need to be carefully monitored for quantifying risk assessment and providing guidelines for water recreational activities (aw and gin, ; aw et al., ) . vergara et al. ( ) quantified illness risks of norovirus in an urban catchment in singapore by using quantitative microbial risk assessment. the finding reported mean probability of illness associated with norovirus were . and . in the scenario of primary contact recreation of adults and children, which is below the usepa guideline value of . (usepa, ) . in addition, other potential emerging viral pathogens (e.g., hepatitis e virus, coronavirus, cyclovirus, bird flu virus), which may originate from wildlife and indigenous animals in tropical forests or animal parks can also spread to neighboring tributaries, potentially causing disease to humans. according to jones et al. ( ) , the emerging infectious diseases are dominated by zoonoses ( . %) and the majority of these are from wildlife ( . %), with an increasing trend and more hotspots concentrated in lower-latitude developing countries. in this study, we observed sequences related to the humaninfective virus, human picobirnavirus and the emerging zoonotic virus, cyclovirus. human picobirnavirus, a bi-segmented doublestranded rna virus, has been detected in both healthy and unhealthy human beings. it has also been found to be prevalent ( % detection frequency) in raw sewage samples collected in the united states (symonds et al., ) . the pathogenicity of human picobirnaviruses has not been established and it has been suggested as an opportunistic pathogen which might cause diarrhea (giordano et al., ; grohmann et al., ) . as metagenomics can only provide relative abundance, further studies are needed using qpcr in order to determine absolute concentrations and to better evaluate health risks. viruses belonging to the circoviridae (approximately . % of the virome) may be involved in disease in vertebrate animals and plants. a large proportion was found to belong to swan circoviruses ( %) and circoviridae ldmd- ( %). two contigs found in the present study were assigned to the suggested cyclovirus-vn which originated from human samples in vietnam (garigliany et al., ) . indeed, cyclovirus vn was initially reported to be restricted to central and southern vietnam but was subsequently detected in both farm animals and human clinical samples from africa, indicating their geographic transmission capacity (garigliany et al., ; van doorn et al., ) . in singapore, human cyclovirus vs (cycv-vs ) was previously found in singapore harbor water by using a metagenomics approach (kim et al., ) . the contig shared a . % nucleotide similarity with human cyclovirus vn (kf ) with a query coverage of . %. the discovery of cyclovirus vn in freshwater tributaries in the densely populated area of site in our study indicates a possible transmission of the emerging human cyclovirus in the singapore urban water cycle. further risk assessment of host populations should be conducted for these water environments (van doorn et al., ) . samples from sites , and had lower occurrence rate (both rpkm and absolute reads matrix) and concentration of human picobirnavirus and cyclovirus affiliated viruses, consistent with lower human population density (< people/km ) except for reservoir . the exception of reservoir (explained in results . ) suggested that drainage points could also be significant drivers in shaping the viral communities as well as land use cover and population density. higher population density could result in high occurrence of human-related viruses (lenaker et al., ) . here, the population density to some extent could reflect the hotspots of human-related viruses. limitation of using population density to quantify human activities and human-related viral hosts in our study exists as the population density referred to is the resident population. however, the mobile population such as those associated with modern transportation, tourism, business travel and immigration could also contribute to dissemination of these highimpact pathogens (arguin et al., ). in addition, public holidays could introduce variation in population numbers to commercial and recreational areas based on lifestyle choices. thereafter, more detailed data will be needed in order to track and investigate the human-related virus transmission patterns through virus-host interaction. in our sequencing data, the absence of the majority of enteric viruses in surface waters is reasonable. a potentially relatively low abundance of human-related viruses in our surface water system and insufficient sequencing coverage could have resulted in rare sequences not being assembled into contigs for further downstream analysis. in contrast with our freshwater ecosystem, a previous study of sewage sludge samples revealed a large number of human viral pathogens, unveiling types of human viruses (bibby and peccia, ) . this result is expected as the sludge matrix harbors large amounts of human-related viruses and the concentrations of viral pathogens are much higher. in our study, a good spearman rank correlation ( . < r < . , p < . ) for the indicator virus pmmov was discovered as the qpcr/rt-qpcr concentrations of this target are relatively higher (pmmov geomean: . gc/l). this suggests that viral metagenomics, to some extent, is a conservative estimate of the true viral abundance, based on validation of rt-qpcr data with rpkm in relative abundance of contigs, especially for contigs with a higher qpcr concentration. similar results were also obtained in a previous study where a significant correlation (p < . ) between qpcr and rpkm across viral taxa in clinical samples was obtained (graf et al., ) . overall, viral metagenomics has its advantage in the simultaneous discovery of the entire set of targets in the community in spite of the intrinsic limitation of downstream bioinformatics (i.e., assembly efficiency and database bias). it can pave the way in finding emerging zoonotic viruses and alternative plausible fecal indicators in predicting viral pathogens in the future. after zooming into the specific and desired targets using high throughput sequencing, gene-specific pcr or qpcr (e.g., cyclovirus-vn) could be further investigated to confirm the presence of these potential viral pathogens and zoonotic viruses to reveal the epidemiology or transmission patterns of these viral pathogens (kim et al., ) . this study has shed light on the diversity of the viral communities in tropical reservoirs and their tributaries with different land use. correlations between the diversity index, physical-chemical parameters and land use patterns showed that environmental parameters (i.e., ph and precipitation) and spatial factors (e.g., reservoirs/tributaries, land use) are the main drivers of the viral community structure. although enteric viruses were not detected by viral metagenomics, human-related viruses, including emerging zoonotic viruses, were detected in our samples indicating the importance of continued monitoring of these environments where specific hosts could be harbored. in addition, the links between qpcr/rt-qpcr and metagenomics were shown using both contigs and reads, confirming that our metagenomics quantification path is reliable and scientific. even though metagenomics sequencing technology cannot replace qpcr/rt-qpcr due to its relatively low sensitivity in detecting gene-specific viral pathogens, its wide coverage of viral targets could add valuable information, such as detecting new, emerging zoonotic viruses and finding alternative fecal indicators or markers of contamination. bacteriophage observations and evolution the marine viromes of four oceanic regions the gaas metagenomic tool and its estimations of viral and microbial average genome size in four major biomes globally mobile populations and the spread of emerging pathogens prevalence and genetic diversity of waterborne pathogenic viruses in surface waters of tropical urban catchments prevalence and genotypes of human noroviruses in tropical urban surface waters and clinical samples in singapore identification of viral pathogen diversity in sewage sludge by metagenome analysis genomic analysis of uncultured marine viral communities the miqe guidelines: minimum information for publication of quantitative real-time pcr experiments oxygen minimum zones harbour novel viral communities with low diversity spatial and temporal variability across life's hierarchies in the terrestrial antarctic human and bovine viruses in the milwaukee river watershed: hydrologically relevant representation and relations with environmental variables metagenomic analysis of coastal rna virus communities human norovirus transmission and evolution in a changing world bacteria-phage interactions in natural environments functional metagenomic profiling of nine biomes metagenomic analysis of rna viruses in a fresh water lake zoonotic hepatitis e virus: classification, animal reservoirs and transmission routes dynamic viral populations in hypersaline systems as revealed by metagenomic assembly viruses in the desert: a metagenomic survey of viral communities in four perennial ponds of the mauritanian sahara global consequences of land use a latitudinal diversity gradient in planktonic marine bacteria cyclovirus cycv-vn species distribution is not limited to vietnam and extends to viral metagenomics analysis of planktonic viruses in east lake diarrhea and enteric emerging viruses in hiv-infected patients bacteriophages: update on application as models for viruses in water unbiased detection of respiratory viruses by use of rna sequencing-based metagenomics: a systematic comparison to a commercial pcr panel enteric viruses and diarrhea in hiv-infected patients ecology drives the worldwide distribution of human diseases novoalign. novocraft technologies seasonal dynamics and metagenomic characterization of marine viruses in goseong bay a virion concentration method for detection of human enteric viruses in oysters by pcr and oligoprobe hybridization global trends in emerging infectious diseases biodiversity of bacteriophages: morphological and biological properties of a large group of phages isolated from urban sewage transporting ocean viromes: invasion of the aquatic biosphere metagenomic investigation of viral communities in ballast water hydrologic, land cover, and seasonal patterns of waterborne pathogens in great lakes tributaries vip: an integrated pipeline for metagenomics of virus identification and discovery alternative fecal indicators and their empirical relationships with enteric viruses, salmonella enterica, and pseudomonas aeruginosa in surface waters of a tropical urban catchment high diversity of the viral community from an antarctic lake spatial and temporal dynamics of virus occurrence in two freshwater lakes captured through metagenomic analysis mapping and quantifying mammalian transcriptomes by rna-seq occurrence of phages infecting escherichia coli o : h carrying the stx gene in sewage from different countries characterization of metagenomes in urban aquatic compartments reveals high prevalence of clinically relevant antibiotic resistance genes in wastewaters variation of bacterial communities with water quality in an urban tropical catchment unhealthy landscapes: policy recommendations on land use change and infectious disease emergence disease emergence from global climate and land use change agricultural intensification, priming for persistence and the emergence of nipah virus: a lethal bat-borne zoonosis gut dna viromes of malawian twins discordant for severe acute malnutrition surveillance of enteric viruses and coliphages in a tropical urban catchment obtaining genomes from uncultivated environmental microorganisms using facs-based single-cell genomics viral and microbial community dynamics in four aquatic environments occurrence of pepper mild mottle virus (pmmov) in groundwater from a karst aquifer system in the yucatan peninsula virsorter: mining viral signal from microbial genomic data evolution and diversity of the microviridae viral family through a collection of new complete genomes assembled from virome reads metavir : new tools for viral metagenome comparison and assembled virome analysis evaluating the efficacy of commercial kits for viral dna/rna extraction metagenomic characterisation of the viral community of lough neagh, the largest freshwater lake in ireland eukaryotic viruses in wastewater samples from the united states relationship of microbiota and cyanobacterial secondary metabolites in planktothricoides-dominated bloom viral metagenomics on animals as a tool for the detection of zoonoses prior to human infection? microbial and viral metagenomes of a subtropical freshwater reservoir subject to climatic disturbances identification of a new cyclovirus in cerebrospinal fluid of patients with acute central nervous system infections risk assessment of noroviruses and human adenoviruses in recreational surface waters microarray-based detection and genotyping of viral pathogens sequence analysis of the human virome in febrile and afebrile children this research grant is supported by the singapore national research foundation under its environment and water research programme and administered by pub, singapore's national water agency (ref: -iris- [idd / / ]). we would like to thank national university of singapore and center for environmental sensing and modeling (censam) for supporting this research. supplementary data related to this article can be found at https://doi.org/ . /j.watres. . . . key: cord- -xgwbl em authors: henderson, clark m.; anderson, christine b.; howard, michael t. title: antisense-induced ribosomal frameshifting date: - - journal: nucleic acids res doi: . /nar/gkl sha: doc_id: cord_uid: xgwbl em programmed ribosomal frameshifting provides a mechanism to decode information located in two overlapping reading frames by diverting a proportion of translating ribosomes into a second open reading frame (orf). the result is the production of two proteins: the product of standard translation from orf and an orf –orf fusion protein. such programmed frameshifting is commonly utilized as a gene expression mechanism in viruses that infect eukaryotic cells and in a subset of cellular genes. rna secondary structures, consisting of pseudoknots or stem–loops, located downstream of the shift site often act as cis-stimulators of frameshifting. here, we demonstrate for the first time that antisense oligonucleotides can functionally mimic these rna structures to induce + ribosomal frameshifting when annealed downstream of the frameshift site, ucc uga. antisense-induced shifting of the ribosome into the + reading frame is highly efficient in both rabbit reticulocyte lysate translation reactions and in cultured mammalian cells. the efficiency of antisense-induced frameshifting at this site is responsive to the sequence context ′ of the shift site and to polyamine levels. the standard triplet readout of the genetic code can be reprogrammed by signals in the mrna to induce ribosomal frameshifting [reviewed in ( ) ( ) ( ) ]. generally, the resulting trans-frame protein product is functional and may in some cases be expressed in equal amounts to the product of standard translation. this elaboration of the genetic code ( , ) demonstrates versatility in decoding. requirements for eukaryotic ribosomal frameshifting include a shift-prone sequence at the decoding site and often a downstream secondary structure in mrna. the majority of À programmed frameshift sites consist of a heptanucleotide sequence x xxy yyz [where x can be a, g, c or u; y can be a or u; and z can be any nucleotide ( ) ]. in this configuration, the p-and a-site trnas can re-pair with at least out of nt when shifted nt towards the end of the mrna. similarly, for + frameshift sites, the identity of the codons in the p-and a-sites of the ribosome is critical for efficient frameshifting. one factor affecting + frameshift efficiency is the initial stability of the p-site trna-mrna interaction in the frame ( ) . high-efficiency frameshifting occurs when the p-site trna does not form standard codon-anticodon interactions ( ) . in some studies, a correlation between + frameshift efficiency and the final stability of the p-site trna-mrna interaction in the + frame has been shown previously ( , ) . however, in other systems there appears to be little correlation ( ) . in addition, competition between decoding of the frame and + frame codons in the a-site may affect frameshifting efficiency ( ) . slow to decode frame codons such as stop codons or those decoded by low abundance trnas favor frameshifting, as do + frame codons with high levels of corresponding cognate trnas ( ) ( ) ( ) ( ) ( ) . high levels of frameshifting are often achieved by the stimulatory action of a cis-acting element located downstream of the shift site. a wide variety of structures, most commonly h-type pseudoknots ( ) , have been identified which stimulate À frameshifting in eukaryotes [for reviews see ( , ) ]. mutagenic and structural data for several of the frameshift stimulators have demonstrated that each pseudoknot has key structural features required for frameshift stimulation ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) . however, unifying structural feature essential for frameshifting has not yet been identified. this observation combined with recent reports that simple antisense oligonucleotides can functionally mimic cis-acting stimulators of À frameshifting ( , ) demonstrates that many different structures can stimulate frameshifting. although it should be noted that not all structures of equal thermodynamic stability can stimulate frameshifting (discussion). rna pseudoknots have also been shown to stimulate programmed + frameshifting in many eukaryotic antizyme genes ( , ) . antizyme is a negative regulator of cellular polyamine levels through its ability to target ornithine decarboxylase (the rate-limiting enzyme in polyamine biosynthesis) for degradation ( ) ( ) ( ) , inhibits polyamine import ( , ) and stimulates export ( ) . antizyme expression is induced by high-intracellular polyamine levels, and decreased with lowered levels. the polyamine sensor is a programmed + frameshift event that is required for antizyme synthesis. *to whom correspondence should be addressed. tel: + ; fax: + ; email: mhoward@genetics.utah.edu Ó the author(s). this is an open access article distributed under the terms of the creative commons attribution non-commercial license (http://creativecommons.org/licenses/ by-nc/ . /uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. at low polyamine levels, termination at the end of open reading frame (orf ) is efficient, whereas at high levels of polyamines, a substantial proportion of ribosomes shift to the + reading frame and then resume standard decoding to synthesize the full-length and active antizyme protein. frameshifting at the mammalian antizyme mrna shift site, ucc uga, is stimulated by two cis-acting signals ( , ) . one of these, the element, encompasses $ bases upstream of shift site and is important for the polyamine effect ( ) ( ) ( ) . the other cis-acting element is a pseudoknot located of the shift site. the mammalian antizyme pseudoknot and a structurally distinct counterpart in a subset of invertebrate antizyme mrnas ( ) are the only pseudoknots known to act as stimulators for + frameshifting in eukaryotes. although it is unknown if pseudoknots stimulate À frameshifting and + frameshifting by different mechanisms, one notable difference is found in positioning of the downstream structure relative to the shift site. naturally occurring pseudoknots or stem-loop stimulators of À frameshifting typically begin $ - nt downstream of the a-site codon of the shift site ( ) , whereas + frameshift pseudoknots are located closer with only a - nt separation from the a-site codon ( ) . mutagenic studies have revealed that altering the size of the spacer affects frameshifting and, in general, reduces efficiency ( , , ( ) ( ) ( ) . here we have tested the ability of antisense oligonucleotides, annealed downstream of the shift-prone site, ucc uga, to induce shifting of the ribosome to the + reading frame. the directionality of frameshifting (either into the + or À reading frame) is shown to be dependent upon the position of the duplex region relative to the shift site, and the efficiency of frameshifting is responsive to polyamine levels and enhanced by the inclusion of stimulatory sequences found upstream of the human antizyme + programmed frameshift site. complementary oligonucleotides, to construct the sequences described in this paper, were synthesized at the university of utah dna/peptide core facility such that when annealed they would have appropriate ends to ligate into the sali/ bamhi sites of the dual luciferase vector, p luc ( ) . dual luciferase constructs were prepared and their sequence was verified as described previously ( ) . insert sequences with shift site in boldface is given as follows: p lucaz wt: tcgacggtctccctccactgctgtag-taacccgggtccggggcctcggtggtgctcctgatg-cccctcacccacccctgaagatcccaggtgggcgag-ggaatagtcagagggatcacaacggatc; p lucaz sp: tcgacggtctccctccactgctgtag-taacccgggtccggggcctcggtggtgctcctgac-cctcacccacccctgaagatcccaggtgggcgagg-gaatagtcagagggatcacaacggatc; p lucaz hp: tcgacggtctccctccactgctgtag-taacccgggtccggggcctcggtggtgctcctgatg- p lucaz pkdel: tcgacggtctccctccactgctg-tagtaacccgggtccggggcctcggtggtgctcct-gatgcccctggatc; p lucaz pkm : tcgacggtctccctccactgctgt-agtaacccgggtccggggcctcggtggtgctcctg-atgcccctcacccaccgggatcacaaggatc; p lucaz sl: tcgacggtctccctccactgctgtagt-aacccgggtccggggcctcggtggtgctcctgatg-cccctcacccacccggatc; p lucaz fs: tcgacgtgctcctgatgcccctg-gatc; p lucaz fsugg: tcgacgtgctcctggtgcccctg-gatc. the dual luciferase constructs ( . mg) described above were added directly to tnt coupled reticulocyte lysate reactions (promega) with s-labeled methionine in a volume of ml. reactions were incubated at c for h. radiolabeled proteins were separated by sds-page and the gels were fixed with . % acetic acid and methanol for min. after drying under vacuum, the gels were visualized using a storm phosphorimager (molecular dynamics) and radioactive bands quantified using imagequant software. percent frameshifting was calculated as the percentage of full-length (frameshift) product relative to the termination product and the full-length product combined. the value of each product was corrected for the number of methionine codons present in the coding sequence. the reported values are the average and standard deviations obtained from at least three independent measurements. tables showing percent frameshifting and standard deviations can be found in supplementary data. plasmid p lucaz pkdel was co-transfected into cv- cells with varying concentrations of az b -o-methyl antisense oligonucleotides under the following conditions. cv- cells ( . · ) in ml of dmem + % fetal bovine serum were added to wells ( / area -well tissue culture treated plates) containing ng of dna, varying amounts of az b antisense oligonucleotides and . ml lipofectamine (invitrogen) in ml of optimem. cells were incubated at c ( % co ) for h. media were then removed from the cells and the transfected cells were lysed in . ml lysis buffer and luciferase activity determined by measuring light emission following injection of ml of luminescence reagent (promega). percent frameshifting was calculated by comparing firefly/renilla luciferase ratios of experimental constructs with those of control constructs: (firefly experimental rlus/renilla experimental rlus)/(firefly control rlus/renilla control rlus) · . the ability of cis-acting rna structures or trans-acting -o-methyl antisense oligonucleotides to induce ribosomal frameshifting was determined by in vitro transcription and translation of a dual luciferase reporter vector, p luc. p luc contains the renilla and firefly luciferase genes on either side of a multiple cloning site, and can be transcribed using the t promoter located upstream of the renilla luciferase gene ( ) . sequences containing shift-prone sites were cloned between the two reporter genes such that the downstream firefly luciferase gene is in the + reading frame. the resulting constructs were then transcribed and translated in vitro with or without complementary cis-acting stimulators of frameshifting at the antizyme shift site initially, three dual luciferase reporter vectors were generated containing the human antizyme frameshift cassette (p luc-az wt) with the and stimulators of frameshifting, with the pseudoknot deleted (p luc-az pkdel), or replaced with a stem-loop (p luc-az hp) (figure ). each constructs was then subjected to coupled transcription and translation reactions in the presence of increasing amounts of spermidine, and the s-labeled products separated by sds-page. table ). maximal levels of frameshifting were found to occur when - mm of antisense oligonucleotide was added to the transcription/translation reactions (supplementary table ). in the presence of . mm exogenous spermidine, highly efficient shifting of ribosomes into the + reading frame (higher than that observed in the wild-type antizyme frameshift cassette) was observed with the addition of az a ( . %), az b ( . %) and az c ( . %) (supplementary table ). the most efficient frameshifting is observed with the antisense oligonucleotide az b which anneals such that spacing between the shift site and the beginning of the duplex region is the same as that observed between the shift site and the beginning of stem of the natural antizyme pseudoknot structure (i.e. each has a nt spacer). to verify that the antisense oligonucleotide was activating ribosomal frameshifting and not transcription slippage, rna was transcribed from p luc-az pkdel in the absence of oligonucleotide and added to reticulocyte lysate translations in the presence of increasing amounts of -o-methyl az b oligonucleotide. frameshifting levels were increased to the same level as that observed in coupled transcription and translation reactions demonstrating that the oligonucleotide acts to induce frameshifting during translation (supplementary figure) . surprisingly, the addition of az a ( spacer) also induced high-level frameshifting into the À reading frame in a manner which was modestly inhibited by the addition of spermidine ( % in the absence and % in the presence of . mm exogenous spermidine) ( figure a and supplementary table ). no À frameshift product was observed when the wild-type antizyme cassette was examined in the absence of antisense oligonucleotide addition (figure ; azwt). as the az a antisense oligonucleotide was designed to anneal directly adjacent to the uga codon of the shift site, it was of interest to determine whether the wild-type antizyme pseudoknot could induce À frameshifting when located in the equivalent position. to address this, a new construct p luc-az - sp ( figure a ) was made by deleting the nt spacer between the pseudoknot and the shift site of p luc-az wt. in this case, the wild-type pseudoknot is directly adjacent to the shift site. the products of in vitro transcription and translation were separated by sds-page. no À frameshift product was observed and levels of the + frameshift product were significantly reduced to $ % ( figure d and supplementary table ) . az a, az b and az c were designed to complement rna sequences encoded by the originating vector. to determine if duplexes formed between the antisense oligonucleotide and adjacent antizyme sequences would result in more efficient frameshift stimulation, reporter vectors were designed to contain a portion of the antizyme stimulator. construct p luc-az pkm contains sequences from the half of the axis formed by the stacking of stem and stem of the pseudoknot ( figure a) . two complementary -o-methyl antisense oligonucleotides were designed. first, pkm has perfect complimentarity to the region starting nt and ending nt downstream of the uga shift site codon. second, pkm is the same except that a mispaired c and bulged a were located at positions and , respectively. these two alterations were included to more closely mimic the natural pseudoknot which also contains a mispaired c and bulged a at equivalent positions along the extended stem formed by the stacking of pseudoknot stems and ( figure ; compare p luc-az wt with the duplex formed between p luc-pkm and antisense oligonucleotide pkm ). pkm and pkm induced and % frameshifting, respectively, when added to coupled transcription and translation reactions of p luc-az pkm in the presence spermidine ( figure a and b, and supplementary table ). neither pkm nor pkm induced frameshifting to the same levels seen with az b, suggesting that the sequence content of the duplex region can affect the efficiency of frameshift stimulation and that native antizyme sequences are not required. a second construct, p luc-az sl, was designed to contain only the half of stem of the antizyme pseudoknot downstream from the ucc uga shift site ( figure a ). -o-methyl antisense oligonucleotides were designed to anneal between and nt (sl ) or and nt (sl ) downstream from the uga codon of the shift site. frameshift efficiency induced by these two antisense oligonucleotides, and % respectively, was somewhat lower than that observed with pkm and pkm ( figure c and d and supplementary table ). in these cases frameshift efficiency was higher for the longer antisense oligonucleotide (sl ), suggesting that frameshift efficiency most probably correlates with stability of the duplex. as was seen with az a, az b and az c, frameshifting efficiency stimulated by antisense oligonucleotides pkm , pkm , sl and sl was also strongly correlated with the concentration of exogenously added spermidine (supplementary table ) . the importance of the antizyme sequence context to antisense oligonucleotide induced ribosome frameshifting was examined by testing the frameshift site, ucc uga, without the and stimulatory antizyme sequences. to this end, the antizyme stimulatory sequences were deleted from p luc-az pkdel to make p luc-az fs. each of the antisense oligonucleotides az a, az b or az c was added to coupled transcription and translation reactions with p luc-az fs in the presence or absence of spermidine. frameshift efficiency was measured at , and %, in the presence of spermidine and , . and . % in its absence for az a, az b and az c, respectively ( figure a and b) . to determine whether the stop codon of the shift site is essential for frameshifting, the uga codon of p luc-az fs was altered to ugg such that the shift site was ucc ugg (p luc-az -ugg). frameshift efficiency was significant, but reduced, compared to the shift site ucc uga, and shows little stimulation by the addition of spermidine; az a, az b and az c induced , and . % frameshifting in the presence of spermidine, and . , . and . % frameshifting in its absence, respectively ( figure c and d) . the ability of antisense oligonucleotides to induce frameshifting in cultured mammalian cells was examined by co-transfection of cv- cells with p lucaz pkdel and increasing amounts of -o-methyl antisense oligonucleotides az b as described in materials and methods. in the absence of antisense oligonucleotide frameshifting levels were determined to be . %, whereas a graded increase in frameshift levels was observed upon the addition of az b ( figure ). maximal frameshifting levels were % in the presence of mm az b in the transfection media. several models attempting to explain pseudoknot stimulation of programmed À frameshifting have been proposed [for reviews see ( , ) ]. most models invoke a pausing mechanism whereby the ribosome is paused over the shift site such that time is allowed for the trnas to reposition in the new reading frame. this explanation is clearly too simplistic as stem-loops and pseudoknots of similar thermodynamic stability that cause ribosome pausing are not necessarily effective frameshift stimulators ( ) ( ) ( ) . in addition, variations of the ibv pseudoknot have demonstrated a lack of correlation between the extent of pausing and the efficiency of frameshifting ( ) . a recent publication by brierley and co-workers ( ) presents structural data demonstrating that the ibv frameshift stimulating pseudoknot blocks the mrna entrance tunnel and leads to a structural deformation of the p-site trna. the resulting movement of the trna displaces the anticodon loop towards the end of the mrna. a model is presented in which this movement results in disruption of the codon-anticodon interactions, thus allowing for trna slippage relative to the mrna. similar trna movements were not observed with non-frameshift stimulating stem-loop structures. this model provides a feasible mechanistic explanation for the ability of some downstream structures to induce frameshifting. the ability of antisense oligonucleotides to induce highlevel À frameshifting ( , ) demonstrates that elaborate tertiary structures are not required, and that a duplex formed by complementary antisense oligonucleotides (with a variety of chemistries, including rna, -o-methyl, morpholino) is sufficient to induce high-level frameshifting. here we demonstrate for the first time that trans-acting antisense oligonucleotides may stimulate ribosome shifting to the + reading frame at surprisingly high levels, levels which are greater than those achieved by natural cis-acting mrna pseudoknot structures in programmed + frameshifting. structural studies indicating that the mrna begins to enter the ribosome - nt downstream from the a-site codon is of direct relevance to this study ( , ) . our results indicate that maximal frameshifting is induced when the antisense-mrna duplex begins nt downstream of the uga of the shift site, in agreement with the distance found between the uga of the shift site and the beginning of stem of the pseudoknot stimulator found in antizyme genes. given this distance, the implication is that the stimulatory secondary structure would be encountered by the ribosome when the ucc codon enters the a-site of the ribosome. perhaps as suggested by the structural studies of the ibv- frameshift inducing pseudoknot, the codon-anticodon interactions between the ucc codon and ser-trna ser are disrupted during translocation to the p-site. given the importance of the uga codon during frameshifting at the ucc uga shift site, subsequent events following translocation of the ucc codon to the p-site and uga to the a-site must influence frameshifting efficiency. this latter event most probably involves competition between termination and + frame decoding when the uga codon is in the a-site. various discussions have been presented for the importance of a-site and p-site events during ribosomal frameshifting ( , ) and clearly, further investigations of this topic are warranted. the observation presented here that the antisense oligonucleotide, az a, which anneals directly adjacent to the uga stop codon can induce ribosome frameshifts to either the + or À reading frame is surprising. in light of the above discussion of spacing for naturally occurring cis-acting frameshift stimulators, it is possible that frameshifting may occur at codons upstream of the known ucc uga shift site. however, visual examination of upstream codons does not reveal an obvious À or + frameshift site. the ability of spermidine to stimulate antisense oligonucleotide induced ribosome frameshifting to the + reading frame at the ucc uga shift site in the absence of the natural stimulator demonstrates that this cis-acting element is not required for polyamine responsiveness. similarly, spermidine stimulation was observed in the absence of the element but virtually eliminated by altering the uga codon of the shift site to ugg. these observations are in agreement with previous studies examining the importance of cis-acting elements for polyamine induced frameshifting during expression of antizyme genes ( ) ( ) ( ) . finally, the ability to direct ribosomes to the + reading frame in living cells ( figure ) suggests a potential therapeutic application for antisense oligonucleotides. directed frameshifting to the + reading frame near a disease causing À frameshift mutation would cause some ribosomes to resume decoding in the wild-type orf, thus restoring partial production of full-length protein from mutant alleles. the importance of the stop codon for efficient frameshifting suggests that the stop codon following the frameshift mutation presents a promising target for antisense induced phenotypic suppression, and that modulation of intracellular polyamine levels, although not essential, may increase the effectiveness of this approach. further experiments are required to determine the therapeutic potential of this approach in vivo including the generality and efficiency of frameshift induction at non-programmed frameshift sites. programmed translational frameshifting recoding: translational bifurcations in gene expression reprogrammed genetic decoding in cellular gene expression recoding: reprogrammed genetic decoding recoding: dynamic reprogramming of translation mutational analysis of the 'slippery-sequence' component of a coronavirus ribosomal frameshifting signal p-site trna as a crucial initiator of ribosomal frameshifting near-cognate peptidyl-trnas promote + programmed translational frameshifting in yeast analysis of effects of trna:message stability on frameshift frequency at the escherichia coli rf programmed frameshift site ribosomal frameshifting from À to + nucleotides special peptidyl-trna molecules can promote translational frameshifting without slippage ribosomal frameshifting in the yeast retrotransposon ty: trnas induce slippage on a nucleotide minimal site a novel programed frameshift expresses the pol gene of retrotransposon ty of yeast: frameshifting without trna slippage normal trnas promote ribosomal frameshifting mechanism of ribosome frameshifting during translation of the genetic code pulling the ribosome out of frame by + at a programmed frameshift site by cognate binding of aminoacyl-trna a new principle of rna folding based on pseudoknotting cold spring harbor symposia on quantitative biology structure, stability and function of rna pseudoknots involved in stimulating ribosomal frameshifting the structure of an rna pseudoknot that causes efficient frameshifting in mouse mammary tumor virus comparative studies of frameshifting and nonframeshifting rna pseudoknots: a mutational and nmr investigation of pseudoknots derived from the bacteriophage t gene mrna and the retroviral gag-pro frameshift site rna, involved in ribosomal frameshifting metal ions and flexibility in a viral rna pseudoknot at atomic resolution minor groove rna triplex in the crystal structure of a ribosomal frameshifting viral pseudoknot specific mutations in a viral rna pseudoknot drastically change ribosomal frameshifting efficiency evidence for an rna pseudoknot loop-helix interaction essential for efficient À ribosomal frameshifting the role of rna pseudoknot stem length in the promotion of efficient À ribosomal frameshifting crystal structure of a luteoviral rna pseudoknot and model for a minimal ribosomal frameshifting motif efficient stimulation of site-specific ribosome frameshifting by antisense oligonucleotides novel application of srna: stimulation of ribosomal frameshifting identification of a new antizyme mrna + frameshifting stimulatory pseudoknot in a subset of diverse invertebrates and its apparent absence in intermediate species reading two bases twice: mammalian antizyme frameshifting in yeast degradation of ornithine decarboxylase: exposure of the c-terminal target by a polyamine-inducible inhibitory protein ornithine decarboxylase is degraded by the s proteasome without ubiquitination determinants of proteasome recognition of ornithine decarboxylase, a ubiquitin-independent substrate feedback repression of polyamine transport is mediated by antizyme in mammalian tissue-culture cells antizyme protects against abnormal accumulation and toxicity of polyamines in ornithine decarboxylase-overproducing cells properties of a polyamine transporter regulated by antizyme cell culture analysis of the regulatory frameshift event required for the expression of mammalian antizymes autoregulatory frameshifting in decoding mammalian ornithine decarboxylase antizyme polyamine sensing during antizyme mrna programmed frameshifting characterization of an efficient coronavirus ribosomal frameshifting signal: requirement for an rna pseudoknot the sequences of and distance between two cis-acting signals determine the efficiency of ribosomal frameshifting in human immunodeficiency virus type and human t-cell leukemia virus type ii in vivo identification and analysis of the pseudoknot-containing gag-pro ribosomal frameshift signal of simian retrovirus- a dual-luciferase reporter system for studying recoding signals sequence specificity of aminoglycoside-induced stop codon readthrough: potential implications for treatment of duchenne muscular dystrophy ribosomal pausing at a frameshifter rna pseudoknot is sensitive to reading phase but shows little correlation with frameshift efficiency ribosomal movement impeded at a pseudoknot required for frameshifting ribosomal pausing during translation of an rna pseudoknot a mechanical explanation of rna pseudoknot function in programmed ribosomal frameshifting the path of messenger rna through the ribosome programmed + translational frameshifting in the yeast saccharomyces cerevisiae results from disruption of translational error correction the authors would like to thank drs pasha baranov, john atkins and lorin petros for critical reading of the manuscript. this project was funded by an mda development grant and nih r ns to m.t.h. funding to pay the open access publication charges for this article was provided by nih r ns .conflict of interest statement. none declared. supplementary data are available at nar online. key: cord- - mzd authors: vanderford, thomas h.; demma, linda j.; feinberg, mark b.; staprans, silvija i.; logsdon, john m. title: adaptation of a diverse simian immunodeficiency virus population to a new host is revealed through a systematic approach to identify amino acid sites under selection date: - - journal: mol biol evol doi: . /molbev/msl sha: doc_id: cord_uid: mzd simian immunodeficiency viruses (siv) have had considerable success at crossing species barriers; both human immunodeficiency virus (hiv)- and hiv- have been transmitted on multiple occasions from siv-infected natural host species. however, the precise evolutionary and ecological mechanisms characterizing a successful cross-species transmission event remain to be elucidated. here, in addition to expanding and clarifying our previous description of the adaptation of a diverse, naturally occurring sivsm inoculum to a new rhesus macaque host, we present an analytical framework for understanding the selective forces driving viral adaptation to a new host. a preliminary analysis of large-scale changes in virus population structure revealed that viruses replicating in the macaques were subject to increasing levels of selection through day postinfection (p.i.), whereas contemporaneous viruses in the mangabeys remained similar to the source inoculum. three different site-by-site methods were employed to identify the amino acid sites responsible for this macaque-specific selection. of amino acid sites analyzed, codons in v , a –amino acid shift in an n-linked glycosylation site, and variation at sites in the highly charged region were consistently evolving under either directional or diversifying selection at days and p.i. this strong macaque-specific selection on the v loop underscores the importance of this region in the adaptation of sivsm to rhesus macaques. due to the extreme viral diversity already extant in the naturally occurring viral inoculum, we employed a broad range of phylogenetic and numerical tools in order to distinguish the signatures of past episodes of selection in viral sequences from more recent selection pressures. transmission of pathogens from animal reservoirs to humans is responsible for many of the world's most deadly and costly epidemics (hirsch et al. ; gao et al. ; holmes ; leroy et al. ; parrish and kawaoka ) . in some cases the mechanisms by which animal pathogens cross the species barrier are relatively easy to identify (qu et al. ; wang et al. ) ; however, zoonotic transmissions are most often due to very complex interactions between ecological, evolutionary, biochemical, and sociological factors (hahn et al. ; palese ). thus, not only is it difficult to identify the causes of past zoonotic epidemics, but these complex interactions may also hinder our ability to predict future epidemics. the development of a framework for identification of critical dynamical and biochemical factors influencing the emergence of infectious diseases is vital to our understanding of epidemics to come. upon transmission to a new host species, viruses must usually adapt to a new genetic and immunologic environment in order to replicate and spread to other individuals within the species (webby et al. ) . the high mutation and replicative rates of rna viruses such as human immunodeficiency virus (hiv) and influenza facilitate the occurrence and fixation of such beneficial mutations (moya et al. ) . viral adaptations to new hosts primarily manifest as amino acid substitutions, which can allow more efficient virus cell entry in the new host (ito et al. ; qu et al. ) , block interactions with detrimental host proteins (mangeat et al. ; stremlau et al. ) , and promote escape from both the new and the old host's immune responses (smith et al. ; wei et al. ) . should potentially adaptive mutations occur naturally before transmission (demma et al. ) or be maintained in an intermediate host (ito et al. ), the zoonotic virus will have a head start adapting to its new host environment. thus, high viral genetic diversity in the reservoir host may be a dominant criterion discriminating a successful zoonosis from a dead-end exposure (demma et al. ) . to understand the evolutionary processes facilitating viral zoonosis, we have been studying the adaptation of the envelope glycoprotein (env) of a diverse simian immunodeficiency virus (siv) isolate from its natural sooty mangabey (sm) host after experimental inoculation of a nonnatural rhesus macaque (rm) host (demma et al. ; silvestri et al. ) . by comparing changes in viral population structure after infection of both sms and rms with the identical diverse viral inoculum, we can identify the specific genetic targets of zoonotic adaptation and gain insight into what selective forces might be driving this zoonotic adaptation. essentially, natural selection after a cross-species transmission would act like a sieve, favoring those amino acid polymorphisms critical for direct hostvirus interactions in the new host, while ignoring neutral sequence variation and specific adaptations to immune responses of the previous host. therefore, our experiment in cross-species adaptation will not only allow us to understand how a viral envelope adapts to a new host, but will also facilitate identification of particular residues in env, which are critical for its functional robustness. most analyses of siv adaptation to rms have involved inoculation of animals with clonal viruses, which had either been passaged in rms previously or were first isolated from rms accidentally infected with sivsm in the s. in these studies, analysis of the ratio of nonsynonymous to synonymous substitutions have revealed particular genes or amino acid sites that are under positive diversifying selection pressure due to immune responses against these sites. these studies of adaptation have been crucial to our understanding of how immunodeficiency viruses adapt to host immune responses. however, because of the preadapted condition of these virus isolates and the lack of genetic diversity upon which zoonotic selection can act, they do not sufficiently address the question of how diverse naturally occurring virus populations adapt to a new host species. furthermore, if the spread of siv into humans was mediated by direct contact with blood from african nonhuman primates (hahn et al. ) , the first infected human would have been exposed to a large, and likely very diverse, bolus of virus. by inoculating rms and sms with a large bolus (; virus copies) of an already highly diverse virus population from a naturally infected sm, we have provided the necessary sequence diversity for selection to act upon and may have more acutely approximated the circumstances of the first nonhuman to human transmission of a primate lentivirus. however, this diversity is the result of the virus having adapted to an animal whose immune responses have left their mark on the virus population's genetic structure (demma et al. ) . therefore, simply calculating the ratio of nonsynonymous to synonymous substitutions at each codon in virus sequences sampled shortly after infection may not reliably distinguish between the evolutionary signatures of immune responses in the donor animal from the selective forces mediating adaptation to the recipient new host species. thus, it is important to minimize the influence of this preexisting selective signature in zoonotic viral sequence analyses by characterizing fluctuations in the overall viral population structure as well as amino acid frequency changes at specific polymorphic sites. therefore, we have employed a combination of intuitive population genetic analyses and more complex models of codon substitution in order to extend our previous analyses (demma et al. ) of the evolution of a natural and diverse sivsm env variable loop and (v v ) after transmission to a nonnatural rm host. this previous study described strong rm-specific restriction in the n-linked glycosylation (n-glyc) motif density and length of the v loop at days and postinfection (p.i.) followed by the emergence after day p.i. of a diverse viral population with an allelic distribution more similar to that of the source inoculum (si). here, employing previously unused numerical analyses and a more comprehensive phylogenetic analysis of the same viral sequence data, we detect strong rm-specific selection in the v loop at days and p.i., which underscores this region's importance in adaptation to the rms. most prominently, changes in the position and frequency of an n-glyc motif in the v loop likely represent an adaptation either to a divergent cd or chemokine coreceptor or to an as-of-yet undetermined target cell population. three sms (fco, fgu, and fln) and three rms (rht, rzw, and rql) were inoculated intravenously with the same diverse sivsm population derived from a naturally infected sm (si). their course of viremia, immunological profiles, and viral genetic characteristics have been previously described (demma et al. ; silvestri et al. ) . briefly, the virus replicated well in all sms but only of the rms (rht and rzw), with peak viremia ranging from . to . viral copies per milliliter plasma and chronic phase setpoints ranging from to copies per milliliter. due to its low to undetectable sivsm viremia, rql is not included in any subsequent analyses. viral sequences were sampled from plasma viral rna by reverse transcriptase-polymerase chain reaction at intervals throughout the first year of infection, and multiple clonal isolates of the virus envelope v v region were amplified and sequenced. input viral rna copy number was not normalized prior to reverse transcription. however, viral load was not significantly different between animals at each time point, limiting potential copy number bias for within-time point comparisons. furthermore, dilutions of a si cdna pool were subjected to the same pcr conditions, cloned, and then sequenced to rule out any potential input copy number bias. finally, multiple pcr reactions were run on several samples to ensure the consistency of sequence results from each reaction. all sequences can be found in genbank with accession numbers ay -ay . to elucidate large-scale differences in the population structure of sivsm infecting the sms and the rms, we applied to our sequence data set the analytical method developed in williamson ( ) . a majority consensus sequence was constructed from the si sequences. alignments of sequences from each time point within each animal were compared with this consensus in sites (hey and wakeley ) to determine the number of nonsynonymous and synonymous polymorphisms occurring at each site and at each time point. polymorphisms were then classified as either common (. %) or rare (, %). under neutrality, the ratio of common nonsynonymous to common synonymous polymorphisms should equal the same ratio of rare polymorphisms. positive diversifying selection on amino acid sequences would manifest itself as an excess of common polymorphisms, thus where a represents the excess number of common nonsynonymous polymorphisms (termed ''adaptive events''), c n and c s are the calculated number of common nonsynonymous and synonymous polymorphisms, respectively, and r n and r s are the number of rare nonsynonymous and synonymous polymorphisms, respectively. deviations of the rate of increase of adaptive events between day and p.i. from were determined separately in the rms and the sms using spearman's rank correlation. mrbayes (huelsenbeck and ronquist ) was used to construct phylogenetic trees ( fig. a -e) of all isolates within each individual and the si from our gap-stripped nucleotide alignments. nucleotide positions within each codon were assumed to evolve at independent rates according to a gamma distribution whose rate parameter was estimated from the data. for each tree, independent runs with monte carlo markov chains were performed on an apple g -node cluster. each chain was run for , , generations sampling every , generations. tracer (http://evolve. zoo.ox.ac.uk/) was used to evaluate chain convergence. the burn-in length was determined independently for each tree, and the run with the longest burn-in (fco: , generations; fgu: , ; fln: , ; rht: , ; rzw: , ) was used when building the consensus tree from both runs. the average standard deviation of the split frequencies (post-burn-in) for each tree was . (fco), . (fgu), . (fln), . (rht), and . (rzw.) bayesian posterior probabilities for each node were estimated as the proportion of trees sampled after burn-in containing each of the observed bipartitions. neighbor-joining trees built using the distance method in paup . b (swofford ) were consistent with our bayesian results, although the consensus bayesian topologies were significantly more likely (kishino-hasegawa test, p , . ). to ensure internal consistency between the trees of individual animals, each tree was stripped of all variants except for the si variants, using treetool . . (maciukenas ) , and the resulting trees were compared with each other as well as with a maximum likelihood tree built only from the si (data not shown). the major si clades are resolved well in all trees, although the topology of variants within these clades varies slightly. despite difficulty in resolving si variants , , , and due to possible recombinant origins, these trees are largely in agreement. amino acid substitutions were mapped onto each individual animal bayesian tree using the parsimony method in macclade (maddison wp and maddison dr ) . the ancestral sequence was assumed to be the node basal to the branch containing si variants , , , , , and . substitutions occurring on clades containing only si variants were not counted in any analysis. after obtaining bulk sitewise counts of the number of amino acid substitutions, each site was scored for the possibility of selection. a site was considered under selection in an animal if ) it contained multiple (. ) amino acid substitutions on terminal branches, representing parallel evolution, or ) it contained one or more synapomorphic amino acid substitutions (sheridan et al. ) . the time points of variants present in clades with amino acid substitutions considered to be adaptive were recorded. due to the inability of likelihood models of codon substitution to explicitly account for differences in the time of sampling between taxa, we calculated rates of nonsynonymous (d n ) and synonymous (d s ) substitution separately at each time point within each animal. modeltest (posada and crandall ) was run on each alignment to determine the best substitution model for building the trees. the most common substitution model was used for all trees to minimize the effects of using different nucleotide substitution models on the outcome of subsequent analyses. maximum likelihood trees of each time point within each animal were then built in paup . b (swofford ) using the hky g model of nucleotide substitution and mating base frequencies, the transition/transversion ratio, and shape of the gamma distribution separately for each alignment. d n and d s were then estimated from the nucleotide alignments under a fixed effects likelihood model of codon substitution in hyphy (kosakovsky ) and scaled to the maximum likelihood trees. unlike many other implementations of codon substitution models where d s is estimated and fixed across the entire sequence, this particular program allows for estimation of both d n and d s at each individual codon. the numbers of nonsynonymous and synonymous substitutions were calculated at each site given the topology of each tree and likelihood ratio tests were performed to determine whether d n was significantly greater than d s . a p-value cutoff of p , . was chosen based on the findings of kosakovsky . sites under species-specific selection were defined as those with consistent selective patterns over time and across animals within that given species. because previous analyses (demma et al. ) of this robust data set focused primarily on viral genetic characteristics at the peak of acute infection and at late times p.i., we sought to understand the pattern of sivsm adaptation to new host targets during the postacute phase, before the development of a strong neutralizing antibody (nab) response. to determine whether viral sequence adaptations beyond those already described were occurring in our experimentally sivsm-infected monkeys, we analyzed our data set using a method (williamson ) designed to track changes in the frequencies of nonsynonymous and synonymous polymorphism in virus populations over time by comparing alignments of each p.i. time point with a consensus of the si (fig. ) . the pattern of sivsm adaptation differs greatly between the viruses replicating in the species over the first days of infection ( fig. a ). the number of adaptive events occurring in sivsm replicating in sms remains relatively stable, fluctuating moderately around through day p.i. this is consistent with the notion that virus populations having replicated in sms for centuries need not adapt to other sms prior to the development of humoral immune responses. in contrast, sivsm exhibits a very distinct but consistent adaptive pattern in both viremic rms. the low, negative values at day p.i. reflect the early outgrowth of a single variant from an si-containing multiple distinct variants (or alleles) of this region of env. following this restriction, a succession of amino acid substitutions in the replicating viral populations occurs between days and . this increase in the number of adaptive events in the rm virus populations is significant ( fig. b ; spearman's rank correlation, p , . ) and, due to its occurrence before the development of strong nab responses, is most likely the result of virus adaptation to divergent host cell receptors or target cell subsets. the subsequent decrease in adaptive events in rms between days and reflects the outgrowth of viral variants more representative of the si and presumably better able to escape emerging humoral immune responses. that virus populations in the sms do not deviate significantly from the allelic distribution seen in the si until after day further supports the notion that humoral immune pressures first develop between day and in these infected animals and select for particular variants resulting in the observed increase in the number of adaptive events at day . sivsm populations adapting to rms are more significantly diverged from the si than viruses replicating in sms to further elucidate temporal changes in the allelic structure of host-specific virus populations and to compare differences in viral divergence between host species, we built bayesian phylogenetic trees of every sivsm isolate within individual animals and the si (fig. ) . the general placement of time points within each full bayesian tree is highly consistent with our previous analyses (demma et al. ) . as before, almost all of the day and day rm isolates cluster with si variants , , , , , and (clade outlined in red on each tree), whereas the contemporaneous sm isolates are well distributed across the major clades. the relatively wider distribution of day rm clones reflects the later reemergence of variants more closely related to the si and are probably indicative of the onset of effective nab responses. strikingly, though, the majority of day variants in both rms fall into either clade ( fig. rht) or clades ( fig. , rzw) . this apparent selection for distinct variants at day may be at least partially responsible for the increase in rm-specific adaptive events noted earlier. to characterize the overall divergence of the viral populations replicating in the newly infected animals from each other, patristic distance matrices were calculated from the bayesian trees using patristicv . (fourment and gibbs ) , and the average distance of each si variant from its closest non-si variant was determined within each species. viruses establishing infection in the rms are significantly more diverged from the si than those replicating in the sms ( . vs. . ; student's t-test, p , . ). additionally, the proportion of si variants whose nearest neighbor is another si variant is significantly greater in the rms than in sms ( . vs. . ; normal approximation of the binomial, p , . ). taken together, these data suggest that in addition to overall differences in the allelic structure of the sivsm populations between the host species, the viruses replicating in rms are significantly more diverged from the si than viruses replicating in the newly infected sms. to identify specific amino acid sites that may be responsible for the disparate evolutionary patterns of viruses replicating in the monkey species, we applied site-bysite analyses to detect particular codons under selection. due to the difficulties inherent in identifying newly selected mutations on the genetic background of a diverse virus population that already bears the mark of strong immune selective pressures (demma et al. ), we applied ) a phylogenetic based method to identify specific amino acid changes and the virus subpopulations in which they occur, ) an analysis of the site-by-site amino acid frequencies to understand the selective changes occurring at each site, and ) a model of codon substitution to both elucidate the genetic signature of selection in the donor and evaluate the robustness of such models in the context of such a complex selective background. the phylogenetic method involved mapping amino acid substitutions at each individual site onto each animal's full-infection phylogenetic tree through day p.i. the average number of amino acid changes per animal, cumulative across the sequence, is much greater in rms ( ) than in sms ( ), consistent with our previous findings (demma et al. ) of an overall greater number of nonsynonymous substitutions in the rms (supplementary figure , supplementary material online). we then classified sites within animals as being under selection based on criteria laid out in sheridan et al. ( ) . despite some variation among sites and species in the temporal pattern of evolution, the sites under selection are largely the same between rms and sms and are primarily located in v . interestingly, the sites ( , , and ) found to be under selection exclusively in the rms were also under selection between days and p.i., the same interval over which the virus populations only in the rms underwent multiple adaptive events ( fig. ) . thus, it is likely that amino acid substitutions or changes in the relative abundance of amino acid polymorphisms at each of these sites mediate sivsm adaptation to this new host. the second method by which we identified amino acid sites under selection involved following changes in the frequency of the consensus amino acid at each site over time ( fig. and supplementary fig. , supplementary material online). this method allows the detection of fluctuations in the relative abundance of existing and de novo amino acid polymorphisms at individual codons. sites exhibiting large fluctuations (. % shifts) in amino acid composition were identified as being under directional selection and thus most likely represent specific adaptations to the rms. overall, a much larger number of amino acid positions were identified as being under rm-specific selection pressures than in the analysis of phylogenetic substitutions (fig. ) . the sites not identified in previously discussed analyses (sites , , , , and ) are focused at the c terminus of the v loop and are selected for primarily at days and p.i. that these amino acid positions were highly polymorphic in the si (see fig. ) and are under selection contemporaneously with the rm-specific restriction in viral diversity at the peak of acute infection suggests that they are selectively neutral and that changes in the relative abundance of these amino acid residues reflect selection at closely linked sites (i.e., n-glyc site at position / ). the consensus amino acid frequencies at sites , , , , , and (supplementary fig. a fig. ). selection at sites and represent the rm-specific loss of n-glyc motif at day and p.i. previously identified (demma et al. ) . although this n-glyc site remains absent or at low levels until day , the viral populations in the rms continue to adapt to the new host environment through changes in consensus amino acid frequencies at specific loci: the highly charged region of v (sites and ) and another n-glyc site (sites and ). interestingly, the polymorphism at site (ser to asn) causes a -amino acid, c-terminal shift in an n-glyc site (see fig. ). taken together, these data demonstrate a continued strong selection pressure during the postacute phase of infection for specific variants, presumably better adapted to the divergent cellular and genetic environment of the rms. ). this phenomenon not only underscores the importance of viral diversity within donor animals for the potential success of cross-species virus transmission but may also obscure the interpretation of more traditional site-by-site likelihood analyses of codon substitution. the large number of viruses ( ) inoculated into our experimentally infected animals ensures that the signatures of selection on the virus population in the si animal will be transmitted as well. thus measurements of the numbers of nonsynonymous and synonymous substitutions (per nonsynonymous and synonymous site, respectively) early in the course of infection will mainly reflect selection biases predating the inoculation of our experimental animals. these sites should be easily identified as those either unaccompanied by changes in relative amino acid abundance or those which are under positive diversifying selection in both species. it is therefore necessary to compare the ability of codon substitution models to identify rm-specific sites under selection with our other sitewise analyses. interestingly, the distribution of sites under selection ( fig. ) is similar to our previous analyses here and elsewhere (demma et al. ) . specifically, the v loop is under strong positive selection in both rms and sms, although fewer sites were evolving under positive selection in rm-specific virus populations than in the sms. however, the sm-specific positively selected sites (sites , , and ) were not well supported by our other analyses. in contrast, only site ( ) out of the (sites , , , , and ) identified to be evolving under positive selection in v was also detected in sms. two of these sites (site and ) were not identified in any other analysis, and therefore, differences between the estimated numbers of nonsynonymous and synonymous substitutions likely predated this infection experiment. finally, the codon identified in the previous site-by-site analyses, which is fig. . -sivsm env v v is highly diverse in the plasma of the naturally infected sm used for experimental inoculation of sms and rms. all si sequences were aligned and summarized using weblogo (http://weblogo.berkeley.edu). the relative height of each amino acid letter designation at any given site represents their frequency, whereas the overall height of the column indicates the amount of information contained at that site. unnumbered amino acid sites were excluded from all analyses due to the presence of gaps. putative n-linked glycosylation motifs (nxs/t) are represented in gray. underlined n-linked glycosylation motifs were not present in the majority of sequences sampled from the rms at day p.i. sites labeled with a filled star were found to be under selection in both the rms and the sms. sites labeled with an open star were found to under rm-specific selection pressures. responsible for a rm-specific increase in the frequency of the c-terminally shifted n-glyc motif, site , was not identified as positively selected in this analysis. here we describe in detail the selective processes accompanying the cross-species transmission of a very diverse sivsm virus inoculum derived from an endemically infected natural host (sm) to a nonnatural host (rm). by applying several evolutionary analyses to serially sampled virus populations for the first days of infection, we have been able to identify the amino acid positions responsible for the continued adaptation of a diverse sivsm inoculum to the new rm cellular and genetic environment. the polymorphisms mediating this adaptation were already present in the si and primarily alter the density and position of n-glyc sites on the virus envelope. in addition, we also demonstrate the utility of a multifaceted approach to studying viral sequence evolution, which combines complex phylogenetic methods and simple intuitive analyses to identify and discriminate between sites that were selected for in the sm reservoir host from those that have only recently come under selection in the new rm host. the env v v sequence variants we describe here are not completely novel. alignment of our sivsm env clones from day p.i. with the same region of several common siv clones from the los alamos hiv sequence database (http://www.hiv.lanl.gov) reveals that the same shift in the n-glyc site at position observed in our rms is also found in most of these rm-adapted isolates ( fig. ). because all of the sequences obtained from the database were the result of one or several accidental transmissions from sms to rms of a single sivsm subtype ( ) only recently described (apetrei et al. , we cannot rule out the alternative hypotheses that this site is not polymorphic in the sivsm subtype- envelope or that this polymorphism was fixed in rms due to an extreme bottleneck upon transmission. regardless, it is probably not entirely coincidental that these subtype- viruses were the first to be described infecting rms because sms infected with other lineages of sivsm were used in the experiments now thought to have facilitated the first siv transmission to rms but did not engender persistent infection of rms . of all the loci identified here as sites under rmspecific selection, the c-terminal shift in an n-glyc site caused by a polymorphism at site is the most intriguing. although this shift is transient (supplementary fig. h , supplementary material online), it does occur early in infection during the brief window (through day ) in which nab responses, the primary immunological selection pressure on env , are thought to be low or absent (rybarczyk et al. ) . thus, preferential expansion in the rms of viruses encoding this shifted n-glyc site as well as amino acid residues at sites and strongly suggests that these mutations are specific adaptations to the divergent rm genetic and cellular environment, as opposed to adaptations to new host immune responses. the partial reversion of this n-glyc site to its n-terminal position beyond day p.i. is due to the reemergence of the presumably more immunologically evasive variants containing both of the n-glyc sites in v at amino acid positions and (see fig. ). the lack of linkage between the c-terminally shifted n-glyc site polymorphism in v and the v loop variants containing both n-glyc sites among all the viruses sampled in our study suggests that this adaptation in v may actually be disadvantageous in the presence of a highly glycosylated v . even though effective nab responses are likely low or nonexistent prior to day p.i., it is possible that immune cell populations may be at least indirectly responsible for these changes in v v allelic structure. it has recently become apparent that acute hiv infection of humans (brenchley et al. ; mehandru et al. ) and siv infection of rms (li et al. ; mattapallil et al. ) and now sms (silvestri g, personal communication) is associated with a massive depletion of a specific memory cd t-cell subset from mucosal tissues caused by a combination of virus-and cytotoxic t lymphocyte-induced cytopathicity (mattapallil et al. ; regoes et al. ) . preferential depletion of this t-cell subset during the early stages of lentiviral infection identifies them as a primary target cell reservoir in both pathogenic and nonpathogenic hosts. ultimately, this profound change in the number and type of target cells could have important consequences for sivsm viral population structures in the early stages of infection. is significantly greater than , whereas the slight decrease in adaptive events among sms is indistinguishable from (spearman's rank correlation, p , . ). more likely, however, is the possibility that the changes we observe in sivsm viral populations are adaptations to the divergent immune cellular environment of the rms. it has recently been reported ) that a number of well-studied natural nonpathogenic siv hosts, including sms, exhibit far lower levels of ccr on these same mucosal memory cd t cells than what is typically seen in nonnatural pathogenic hosts, like humans and rms. the authors hypothesize that this could be a convergent evolutionary mechanism for ameliorating the pathogenic effects of lentiviral infection of many african nonhuman primates although not explicitly preventing host-to-host spread of the virus. however, the fact that the level of vi-remia (broussard et al. ; goldstein et al. ) and the rate of infected target cell turnover (ho et al. ; mohri et al. ) (silvestri g, personal communication) are similar in both pathogenic and nonpathogenic hosts suggests that sivs replicating in their natural hosts may be better able to use multiple chemokine coreceptors to facilitate entry into a broader array of short-lived target cells than their counterparts infecting pathogenic hosts. whether this dearth of ccr expression in natural hosts results in lower levels of siv infection of memory cd t cells or is simply a hallmark of the lower level of immune activation seen in the natural hosts, this phenomenon underscores the dramatic difference between sms and rms in their immune fig. .-summary of the amino acid sites found to be under selection for all site-by-site analyses. analyses are partitioned by analysis then by species and then by time point. gray boxes indicate selection at that site and time point. '' '' or ''À'' symbols indicate the presence of positive or purifying selection, respectively, as determined by the maximum likelihood analysis of dn and ds (p , . , likelihood ratio test). sites with gray forward hatching indicate the sites in v found to be under selection in a majority of the analyses. sites with gray backward hatching are the sites in v found to be under positive selection by only the likelihood method. fig. .-sivsm populations adapting to rms are distinct from those in sms. bayesian phylogenetic trees of all variants sampled from each individual animal and the si. fco, fgu, and fln are the sms, and rht and rzw are the rms. time points sampled from individually inoculated animals are represented by colored squares. the si is represented by green triangles, and each variant is numbered. the clade containing the si variants from which most of the day rm variants are descended is outlined in red. the node on which trees were rooted to determine the ancestral sequence for inferring substitutions is indicated with a light green circle. cell phenotype and thus their target cell landscape. it will be important to investigate the ability of the specific adaptations identified in this and previous studies (demma et al. ) to allow the sivsm envelope to utilize the divergent rm cd receptor as well as the various rm chemokine coreceptors that can mediate virus entry into target cells. supplementary figures and are available at molecular biology and evolution online (http://www.mbe. oxfordjournals.org/). molecular epidemiology of simian immunodeficiency virus sivsm in u.s. primate centers unravels the origin of sivmac and sivstm kuru experiments triggered the emergence of pathogenic sivmac cd t cell depletion during all stages of hiv disease occurs predominantly in the gastrointestinal tract simian immunodeficiency virus replicates to high levels in naturally infected african green monkeys without inducing immunologic or neurologic disease siv quasispecies adaptation to a simian new host evolution of the uniquely adaptable lentiviral envelope in a natural reservoir host patristic: a program for calculating patristic distances and graphically comparing the components of genetic change neutralizing antibody responses drive the evolution of human immunodeficiency virus type envelope during recent hiv infection co-authors). . origin of hiv- in the chimpanzee pan troglodytes troglodytes plateau levels of viremia correlate with the degree of cd -t-cell loss in simian immunodeficiency virus sivagm-infected pigtailed macaques: variable pathogenicity of natural sivagm isolates aids as a zoonosis: scientific and public health implications a coalescent estimator of the population recombination rate an african primate lentivirus (sivsm) closely related to hiv- rapid turnover of plasma virions and cd lymphocytes in hiv- infection adaptation of sars coronavirus to humans mrbayes: bayesian inference of phylogenetic trees molecular basis for the generation in pigs of influenza a viruses with pandemic potential hyphy: hypothesis testing using phylogenies not so different after all: a comparison of methods for detecting amino acid sites under selection fruit bats as reservoirs of ebola virus peak siv replication in resting memory cd t cells depletes gut lamina propria cd t cells sivsm v sequences at day in rms are more similar to v sequences from several major macaque-adapted siv clones than to the contemporaneous sm v sequences. the sites under significant rmspecific selection are indicated by stars, and n-glyc sites are backed in gray )and aligned by hand to maximize amino acid and codon usage similarity interactive analysis of phylogeny and character evolution using the computer program macclade broad antiretroviral defence by human apobec g through lethal editing of nascent reverse transcripts massive infection and loss of memory cd t cells in multiple tissues during acute siv infection primary hiv- infection is associated with preferential depletion of cd t lymphocytes from effector sites in the gastrointestinal tract rapid turnover of t lymphocytes in siv-infected rhesus macaques the population genetics and evolutionary epidemiology of rna viruses influenza: old and new threats paucity of cd ccr t-cells is a typical feature of natural siv hosts the origins of new pandemic viruses: the acquisition of new host ranges by canine parvovirus and influenza a viruses modeltest: testing the model of dna substitution identification of two critical amino acid residues of the severe acute respiratory syndrome coronavirus spike protein for its variation in zoonotic tropism transition via a double substitution strategy roles of target cells and virus-specific cellular immunity in primary simian immunodeficiency virus infection correlation between env v /v region diversification and neutralizing antibodies during primary infection by simian immunodeficiency virus sm in rhesus macaques highresolution phylogenetic analysis of hepatitis c virus adaptation and its relationship to disease progression divergent host responses during primary sivsmm infection of natural mangabey and non-natural rhesus macaque hosts mapping the antigenic and genetic evolution of influenza virus the cytoplasmic body component trim al-pha restricts hiv- infection in old world monkeys paup*. phylogenetic analysis using parsimony (*and other methods) sars-cov infection in a restaurant from palm civet molecular constraints to interspecies transmission of viral pathogens antibody neutralization and escape by hiv- adaptation in the env gene of hiv- and evolutionary theories of disease progression the authors would like to thank scott williamson and oliver pybus for helpful discussions concerning the methods used in this paper. this work was supported by the national institutes of health (r ai ) and the yerkes national primate center (p rr ). key: cord- -ehaiqye authors: peterson, ryan r. title: over the caribbean top: community well-being and over-tourism in small island tourism economies date: - - journal: int doi: . /s - - - sha: doc_id: cord_uid: ehaiqye the caribbean is one of the most tourism-intense regions of the world with rising levels of over-tourism, especially in dependent small island tourism economies (site). more critically, mounting socio-ecological pressures are compounded by increasing climate change and enduring social vulnerabilities, thereby challenging traditional policies and paradigms of growth and sustainability. drawing on previous studies of inclusive development and community well-being, this research paper frames and extends the phenomenon of over-tourism from a political economic perspective. based on a historical account of small island tourism development, an in-depth case study of aruba is presented. recognized internationally as the ‘one happy island’ and one of the most tourism-dependent small island economies, the findings yield a contextualized understanding of the complex and dynamic nature of over-tourism, and identify the main antecedents and effects of over-tourism. the study discusses the evolving economic disconnectedness, environmental decay, social inequality, and institutional failures. the findings describe the role of institutional capture and policy drift which stem primarily from political as well as market forces, and have resulted in a gradual marginalization of community well-being and agency. the paper proposes an extended conceptualization of over-tourism in small island tourism economies by explicitly recognizing that the crux of the over-tourism conundrum in site is political in nature and institutional by nurture. recommendations are provided for transitioning towards community-driven development by building capabilities and pathways for innovation, internalization, and institutionalization in order to strengthen the resilience of small island tourism development. what started as a casual affair almost a century ago has today turned into one of the most vibrant and fastest growing industries. up until the covid- pandemic, the caribbean was one of the most tourism-intense regions of the world with international tourism contributing, on average, to % of exports, % of gdp (gross domestic product), and % of labor (wttc ). likewise, accounting for at least % of capital investments, international tourism is one of the most resource-intense industries, including financial, human, and natural resources in the caribbean (mcelroy and pearce ) . over the past years, caribbean tourism arrivals have grown tenfold, from less than million visitors during the early 's to well over million tourists today and is expected to continue to grow in the next decade (wttc ). yet paradoxically, despite this significant and continued tourism growth, there are increasing signs that economic growth has largely stagnated across the caribbean, especially in the smaller and more tourism-dependent island economies (acevedo et al. ; ruprah et al. ; peterson ; leigh et al. ) . initial evidence suggests that the surge in international tourism has not contributed significantly to the lackluster economic growth since the early 's (chamon et al. ) . this precarious reality is consistent with previous studies reporting stagnant growth and diminishing productivity in caribbean tourism economies (ruprah ; peterson ) , which could be indicative of a maturing of tourism destinations along the tourism life cycle (butler ) , or more poignantly, a self-destructive tourism fetish with growth and expansion beyond the destination's socio-ecological capacities (higgins-desbiolles ). the confluence of enduring tourism growth with diminishing economic development in small island tourism economies (sites) raises serious questions about the role and contribution of tourism for inclusive development and community well-being in the caribbean (unsdg ), especially considering the complex of economic, health, and environmental shocks in addition to the longstanding social vulnerabilities and institutional weaknesses (imf ; ruprah et al. ; peterson ) . whereas tourism specialization is traditionally associated with economic production and growth (cannonier and galloway ; de vita & kyaw ; marsiglio ) , the adverse externalities thereof are, however, also well established (daye et al. ; duval ; gossling ; hall and williams ; mcelroy ; peterson ; wilkinson ) . furthermore, the relationship between tourism specialization and economic growth is moderated by absorptive capacities (acemoglu & robinson ; baldacchino ; brautigam and woolcock ; peterson et al. ), which describe the optimum level of tourism specialization that can be assimilated and absorbed by an economy before reaching an inflection point after which tourism specialization experiences diminishing returns and negative externalities (de vita & kyaw ; marsiglio ) . in terms of the tourism surge in the caribbean, mcelroy and pearce ( ) contends that part of the problem in the caribbean is that much of the tourism growth since the early 's too fast, unregulated, and fragmented. according to farrell and runyan ( ) , this rapid and unbalanced growth of tourism produces an inherent propensity for environmental overrun and sociocultural disruption, which in due course affect economic sustainability and societal well-being. as the intensity and concentration of tourism growth increases, the capacity of delicate socio-ecological island systems to absorb these changes can be drastically exceeded and may produce undesirable resource degradation (farrell and runyan ) , ironically destroying the very seeds of their previous success. when reviewing the history of tourism in the caribbean, it is blatantly apparent that growth rather than development remains the overriding focus. a perusal of caribbean tourism destination websites and the caribbean tourism organization (cto ) reveals that the dominant modus operandi is geared at increasing arrivals and industry revenues, attracting investments, creating jobs, and expanding cruise-lines and accommodations infrastructures; all in the name of economic growth and wealth hoarding. exemplary of this continued tourism rhetoric is the recent caribbean push towards inclusive tourism development by focusing on the market acceptance of new tourism services and how community tourism can support product differentiation with the ultimate benefit being the creation of a distinctive tourism brand that stimulate economic growth, productivity, and competitiveness (cto ). still couched within a narrow neoliberal tourism agenda, quality of life, shared value, environmental conservation, and community well-being are at best subjugated in national tourism policies that pay little more than lip service to the various principles of inclusiveness, integrity, equity, and sustainability (daye et al. ; duval ; joppe ; peterson ; scheyvens and biddulph ) . even if considered as an afterthought or part of the tourism destination's strategy, no formal system for measuring and monitoring non-economic impacts is implemented. contrary to the espoused national agendas for a sustainable tourism destination, in situ political agendas are largely cloaked for sustaining a tourism industry and have largely eclipsed social rights and responsibilities, de facto subduing the role of government and civic organizations to govern for the generaland futurewell-being of society (higgins-desbiolles ). more importantly, many small island communities across the caribbean continue to scream in silence for responsible and inclusive tourism (cole ; duval ; peterson et al. ) . reminiscent of polanyi ( ) and despite some of the nascent benefits of tourism, caribbean societies seem to have become largely subservient to international tourism markets, rather than tourism markets fostering societal well-being on caribbean terms (duval ; pattulo ; sheller ) . in fact, community and small business tourism in the caribbean oftentimes operate within a context where their viability is largely determined by global and national actors (daye et al. ) . the enduring fixation on constant increases in tourism volumes and economic value, and the persistent use and promotion of conventional tourism (economic) growth metrics, are testimony to an enduring obsession with physical expansion that has plagued much of the caribbean; one that is intimately connected to destination politics, power, and political cycles (acemoglu and robinson ; bishop ; cole ; daye et al. ; duval ; mcelroy and de albuquerque ; peterson et al. ) . in fact, the political economy of caribbean island tourism is oftentimes riddled by exclusion and extractionrather than inclusion and regenerationas witnessed by several tourism-instigated social disturbances and environmental destruction across the caribbean since the late 's (bishop ) . over the past two decades, several studies have cautioned against the tourism sprawl of social exclusion and ecological decay, and the dire implications thereof in the long run (daye et al. ; duval ; joppe ; mcelroy and de albuquerque ; richter ) . moreover, economic considerations and benefits of tourism growth and specialization tend to induce 'tourism myopia'a short-term growth orientation on tourism arrivals, receipts, and (tax) revenuesand trigger a gradual tourism overshoot of socioecological ceilings with significant costs in the medium to long term (dodds and butler ; joppe ; marsiglio ; raworth ) . according the richter ( ) , it is ironic that the pace of tourism can do more damage to the societies in which it flourishes through myopic expansion than by community-inclusive development. likewise, crandall ( ) concludes that while tourism is accepted as a significant boon to local economies, there is little realization on the part of tourism elitesthose that benefit most from tourism without carrying the coststhat tourism leads to economic spillovers, social changes, and ecological challenges, especially when unplanned or uncontrolled. although certainly not a new experience, this mounting tourism spillover effect has recently been coined over-tourism (dodds and butler ; wttc ) . although over-tourism is usually defined in terms of the adverse impact of excessive tourism on (parts of) a destination that influences perceived quality of life of citizens and/or quality of visitor experiences in an undesired way (unwto ) , this narrow conceptualization disregards or downplays the political economic role of tourism, especially in the caribbean that is generally characterized by dense and tightly-knit social networks (benedict ; daye et al. ) . from a political economic perspective, daye et al. ( ) contend that the key to investigating and understanding caribbean tourism development, and especially the 'tourism over-run', is the historical context, local realities, and the 'invisible' interactions that influence tourism policies and decisions. likewise, duval ( ) argues that understanding caribbean tourism requires a historical and contextual understanding of the less-observable social mechanisms and formative political, economic, social, and institutional processes that shape tourism policies and institutional behaviors in small islands. beyond the physical notions or symptoms of overcrowding or carrying capacity (unwto ; wttc ) , this study contends that, at its core, over-tourism is a manifestation of certain institutional powers and processes, and represents the social overpowerment or disempowerment of a destination community's agency (daye et al. ; duval ; hall and williams ; joppe ; mccool and lime ; moscardo ; richter ; williams and ponsford ) . within contemporary studies and conventional policies on (over-) tourism, the focus is mainly on rational policy formulation and narrowly defined socio-economic processes, rather than on the political context and institutional frame in which power is wielded and governed to preserve and control existing political economic structures and tourism institutions (acemoglu & robinson ; dodds and butler ; hall and williams ; joppe ) . in reflecting on previous studies in the caribbean, duval ( ) argues that too often research interprets results through the lens of apolitical and ahistorical perspectives. thus, the fundamental role of institutional framing and failures remains relatively under-scrutinized in studies on over-tourism (duval ; hall and williams ; joppe ) . furthermore, whereas previous studies on over-tourism relate mostly to metropolitan and city districts, over-tourism in small island communities remains largely void of empirical investigation. moreover, research on international tourism in the caribbean focuses almost exclusively on the independent (sovereign) small island tourism states, while dependent (non-sovereign) island tourism economies are generally less scrutinized and often excluded, largely due to their non-sovereign political status. ironically, these subnational island jurisdictions (snijs) are oftentimes relatively more tourism-intense and prone to over-tourism (baldacchino ; mcelroy and pearce ; peterson ; wttc ) . considering the aforementioned challenges in contemporary caribbean tourism and the lack of extensive empirical studies on over-tourism in sites, this paper studies the political economy of over-tourism in a non-sovereign site and presents a historical case study on the evolution and socio-ecological impacts of tourism growth in aruba. the aim of the in-depth case study is to explore the main antecedents, processes, and effects of over-tourism, and contribute to a more comprehensive and contextualized understanding of the complexity and dynamics of over-tourism within the context of a contemporary small island community in the caribbean. hence, this study seeks to address and explain how and why contemporary over-tourism emerges and endures in a caribbean site and therein question existing paradigms and maxims of tourism institutionalization. in stretching the conventional conceptualization of over-tourism and exploring the institutional roots of excessive and unregulated tourism growth in the caribbean, this study aims to explain the political-economic development of overtourism from within a caribbean island perspective. internationally acclaimed as the 'one happy island' and one of the most tourismdependent small island economies (mcelroy and pearce ; wttc ) , the case of aruba is examined to understand how over-tourism and the impacts thereof have evolved over the past years. aruba is a sub-national island jurisdiction within the kingdom of the netherlands and part of the dutch caribbean. aruba is geographically located in the southern caribbean sea on the peripheral of the caribbean hurricane belt. with a registered population of an estimated , , aruba has a total surface area of km , and a coastline of km. considering aruba's politically dependent status and the extensive tourism history dating back to the early 's, the case study draws on a mixed method of qualitative and quantitative methods to examine the political economic and socio-ecological dynamics of over-tourism from a holistic small island perspective. based on an extensive historical economic survey of various social, economic, and environmental policies and indicators, the focal drivers and community impacts of over-tourism are identified. the remainder of this paper is structured as follows. in "theoretical background" section, the theoretical background of this study is discussed by reviewing the conceptual origins of and previous studies on over-tourism, inclusive tourism, communitybased tourism, and community well-being. the research design and methodology are described in "research design" section, followed by a presentation of the main findings in "results" section. the conclusions and recommendations are presented in "conclusion" section. conventionally, over-tourism describes the adverse impacts of uncontrolled tourism growthan overshoot of tourismthat influences the (perceived) well-being of citizens and the degradation of natural habitats and ecologies, which result in diminishing visitor experiences and expenditures, and consequently, stagnating economic returns (unwto ). over-tourism portrays relentless, frequently unregulated, tourism growth that has moved beyond the level of acceptable change and absorptive capacity in a destination due to significant levels of tourism intensity (total visitors-to-population), tourism density (visitors per km ), and tourism dependency (tourism exports-to-gdp). the compounding effects result in significant pressures on infrastructure (i.e., congestion, transportation, and energy), resource consumption and pollution (i.e., leakage and waste), spatial and cultural alienation (i.e., real-estate and social identity), and visitors' experiences and residents' quality of life (crest ; wttc ) . however, contrary to the mainstream beliefs and accepted definitions that overtourism is concerned with the volume of visitors and challenges of, e.g., crowding and congestion, the overshoot by tourism is also reflected in the values and behavioral norms of a destination's institutions, community, and visitors (crest ; cole ; joppe ) . par example, disrupting fragile coastal zones with motorized vehicles, disturbing sensitive turtle-nesting areas, or trashing bio-diverse marine parks are mere examples where the number of visitors may not overwhelm per se. likewise, a single immoral act by a visitor can outrage a community, especially when acts are prohibited or unlawful, yet are tolerated by authorities out of concern for a possible tourism backlash. in the long run, tourism behaviors may become institutionalized until changes are no longer accepted or acceptable by the local community (dodds and butler ; joppe ; williams and ponsford ) . more importantly, the institutional behaviors and ethics of destination governments and authorities are fundamental to understanding the complexity and dynamics of overtourism, especially within the context of small island societies with strong political ties (benedict ; bishop ) . the crux of the over-tourism conundrum and its resolution are well beyond the boundaries of tourism as an industry; they are political by nature and institutional by nurture (baldacchino ; joppe ; peterson et al. ) . thus, in terms of concept stretching (pearce and butler ) and beyond conventional definitions of over-tourism that emphasize tourism intensity and visitor density, destination values and community norms as well as political-institutional ideologies and behaviors are a defining, albeit oftentimes tacit, element of contemporary over-tourism (duval ; joppe ) . by deconstructing over-tourism from an emic island perspective through a politicaleconomic institutional lens (bishop ; duval ) , this study addresses the limitations of traditional normative and reductionistic tourism-centric approaches (daye et al. peterson et al. ) , and conceptualizes over-tourism from a contextualized perspective in which both norms and networks of purpose, power, people, and place take center stage in social construction of tourism development and institutional behaviors (cloutier et al. ; hall and williams ; joppe ; richter ) . although the adverse effects of excessive tourism are recognized as symptomatic of over-tourism (unwto ), fundamentally, over-tourism reflects the exclusion of a destination's community and agency to co-determine its tourism development (daye et al. ; giampiccoli and saayman ) ; it is essentially about the social overpowerment or disempowerment of a community's voice and choice with regard to the desire, direction, and development of tourism within a specific destination (hall and williams ; joppe ; mccool and lime ; moscardo ; richter ; scheyvens and biddulph ; williams and ponsford ) . this underscores both the absence and, consequently, the importance of inclusive tourism governance and community-driven tourism development for safeguarding community well-being (moscardo ; scheyvens and biddulph ) . the genesis of over-tourism dates back to at least the 's and 's when initial concerns were raised about the potential adverse social and environmental impacts of uncontrolled tourism growth, and consequently, the long-run economic repercussions thereof (bosselman ; budowski ; butler ; cohen ; doxey ; dunkel ; farrell and runyan ; getz ; holder ; innskeep ; mathieson & wall ; pearce ; richter ; wilkinson ) . by the early 's, several empirical studies reported on the negative externalities of tourism in sites (bishop ; duval ; mcelroy mcelroy sheller ) . over the past decade, further evidence has been forthcoming on the role and rise of over-tourism, albeit mainly focused on metropolitan areas and cities (capocchi et al. ; crest ; dodds and butler ; wttc ) and world heritage sites (milano et al. ; musikanski et al. ). however, echoing some of the critique on over-tourism (joppe ) , much of the conceptual and intellectual work on caribbean island tourism has not progressed much beyond an embryonic stage of objectivistic reductionism (daye et al. ; duval ; pearce and butler ; peterson et al. ) . over-tourism extends previous theoretical frameworks and models of tourism lifecycles and complex adaptive tourism systems. the origins can be traced back to notions of the tourism destination lifecycle (butler ) and tourism carrying capacity (mathieson and wall ) , which have been widely discussed in the caribbean. furthermore, the concept of over-tourism underscores the nonlinear, interdependent, and dynamic nature of tourism systems (farrell and twinning-ward ) , which encompass several interacting social, political, economic, ecological, and digital subsystems, especially within the small(er) scale of island communities (peterson et al. ). these complex adaptive tourism systems are 'nested' or embedded within social and political ecologies and often evolve in distinct ways with extensive cascades of uncertain, path dependent, and long-term effects (dodds and butler ; farrell and twinning-ward ) . as a concept, over-tourism is rooted in development economics and discussions on overdevelopment, overdependency, and overconsumption (kohr ; meier and stiglitz ) . from a post-development theoretical perspective (cowen and shenton ) , over-tourism refers to the social inequality and the environmental destruction due to excessive tourism consumption and tourism-related infrastructure expansion. over-tourism is conceptually embedded in the study of how economies grow and societies change over the course of history (meier and stiglitz ) , and is frequently viewed in negative terms as the mutually constitutive reverse of inclusive development and inclusive tourism (gupta and vegeling ; unsdg ; world bank ) . inclusive development focuses on productive employment as a means of increasing income as well as raising standards of living and community well-being (gupta and vegeling ; cloutier et al. ). the quality of opportunity and participation in growth, with a special focus on the working poor and the un(der)employed, are integral to inclusive development (ianchovichina and lundstrom ; ranieri and ramos ; rauniyar and kanbur ) . in development economics, it is not only the rate of real gdp per capita growth that matters, but more importantly, the pattern of labor force participation and income distribution in growth (meier ) . essentially, fostering societal and community well-being for cultivating resilient societies takes center stage in inclusive development. gupta and vegeling ( ) emphasize both the social and ecological aspects of inclusive development. whereas social elements address community well-being and participation in labor and consumption markets, ecological elements concentrate on the conservation of local ecosystems, the management of ecosystem services, and the regulation of environmental resources. inclusive development resonates strongly with the community well-being principles of purpose, place, and relation as discussed by cloutier et al. ( ) . these principles describe the nurturing of a shared sense of identity, inclusion, and (intergenerational) equity in societies. community well-being constitutes a combination of multiple factors and conditions that shape community's quality of life, including, social, environmental, economic, political, and cultural factors, which collectively and holistically, describe a community's happiness and well-being (cloutier et al. ; musikanski et al. ). according to scheyvens and biddulph ( ) , the ultimate goal of community-based tourism development is to empower the destination's (host) community in four dimensions, i.e., economic, psychological, social, and political. vanderweele ( ) underscores multiple objective and subjective dimensions of community well-being, and discusses the importance empowerment, trust, and governance for fostering community well-being. moscardo ( ) argues that community trust in tourism institutions is essential to strengthen community capacity and enable community-based tourism development. thereto, community awareness and education, active community involvement as well as community advocacy and the integration of a community's values and aspirations for tourism development are considered pivotal to safeguarding a destination's community well-being (giampiccoli and saayman ; moscardo ; scheyvens and biddulph ) . in the absence of community trust and institutional credibility, a country may soon find itself drifting along the waves and whims of elitist powers to the detriment of society and environment (acemoglu and robinson ) . the need for inclusive development stems from the realization that relentless economic growth often gives rise to negative externalities, extractive resource depletions, and exploitative labor practices (raworth ) , which are clear and present features of over-tourism and readily acknowledged in caribbean sites (daye et al. ; duval ; island resource foundation ; mcelroy and pearce ; pattulo ; sheller ) . in terms of raworth ( ) , over-tourism is sparked when the ecological ceilings and the social foundations of an economy are depleted. according to scheyvens and biddulph ( ) , one of the most enduring critiques of tourism is its non-inclusive development. they contend that tourism oftentimes provides opportunities for the privileged, creating profits for international (non-local) resorts, and building exclusive enclaves for the rich, thereby excluding the indigenous community, marginalizing local cultures and lifestyles, and depleting scarce natural resources (scheyvens and biddulph ) . thus, in terms of safeguarding community wellbeing, inclusive tourism governance is not only concerned with the active involvement and purposeful engagement of the community and civic society but also in the participation and distribution of tourism's benefits, i.e., the realization of shared value from tourism development (giampiccoli and saayman ; moscardo ; peterson et al. ; scheyvens and biddulph ) . historically, however, previous studies indicate that caribbean tourism developed in a context of a relatively weak state and marginalized community (pattulo ; sheller ; daye et al. ; duval ; peterson et al. ) . scant policy attention and political commitment has been paid to improving the involvement of the community in tourism development and the subsequent sharing of value. despite the many calls by international organizations for including the local community in tourism, many contemporary efforts are largely symbolic. even if and when, local stakeholders and civic organizations are invited to participate in tourism decision-making and policy development, their voices and choices are soon forgotten or neglected when plans are implemented. merely creating jobs for the community is not sufficient, especially when these jobs are being created for migrant workers. it is therefore essential that communities are empowered and engaged in tourism planning and at the fore of tourism development (duval ; giampiccoli and saayman ; moscardo ; peterson et al. ; scheyvens and biddulph ) . the general effects of over-tourism are frequently transmitted through direct as well as indirect channels. direct channels of over-tourism transmission describe diminishing or negative tourism contribution to gdp, declining average visitor expenditures, increasing import leakages, growing resource consumption, high tourism export concentration, tourism price inflation, and expansive government expenditures. overtourism also transmits through indirect channels, which effect local communities and natural habitats. indirect channels of over-tourism transmission include stagnant labor participation rates, limited or declining income equality, uneven income distribution, increasing social costs, foreign-ownership concentration of tourism industry, spatial concentration of tourism industry, real-estate price inflation, environmental degradation, loss of natural habitats, and diminishing contribution of tourism ecological services (capocchi et al. ; daye et al. ; duval ; hampton and jeyacheya ; koens et al. ; wttc ) . however, unlike direct transmission channels of over-tourism, indirect effects often transpire and materialize over extended periods of time and may span several business and political cycles. in reviewing the progressive development and potential challenges of tourism growth across sites, mcelroy and pearce ( ) discusses different interrelated causes of a tourism overrun, defined as high-density tourism with damaging levels of visitation due to tourism's sociocultural pressures and environmental footprint. the critical factors that spur over-tourism in the caribbean include (a) the substantial inflow of foreign private tourism investments; (b) the significant stock and rapid expansion of large-scale accommodation facilities; (c) the growth in air traffic and cruise calls; (d) the increase in labor immigration; and (e) the subsequent rise in unplanned coastal urbanization and real-estate infrastructures (mcelroy and pearce ) . previous studies indeed confirm that this system of an interlocked tourism supply chain, including the growth in tourism investments and airlift, and the subsequent expansion of accommodations and required labor, contributes to surging levels of tourism intensity and density in the caribbean, which gradually engenders a state of over-tourism in sites. likewise, cole ( ) indicates that an overshoot in caribbean tourism arises from several interdependent factors, including e.g., (a) surpassing physical limits of beachfront or coastal areas for resort construction, (b) increasing labor migration due to limited local workforce, (c) growing visitors' sense of overcrowding, and (d) an escalation in residents feeling overwhelmed or displaced by visitors and/or immigrant workers. the latter describes intensifying sentiments of visitor annoyance and apathy by local communities (doxey ) . the unfolding of these events triggers a spiral of demise where surging small island coastal tourism causes increasing crowding, congestion, and contamination (mcelroy and de albuquerque ). frequently, this leads to irreversible ecological destruction, social decay, and aesthetic repulsion, and a further uncontrolled spiraling effect (dehoorne et al. ) . the case in point is especially relevant for sites that rely on their natural and social ecologies for safeguarding economic development and well-being. whereas sustainable tourism requires the conservation of ecological integrity and environmental resources, its production is, paradoxically, largely dependent upon the consumption of naturebased tourism experiences (williams & ponsford ) . likewise, while much of caribbean tourism is staged by its cultural authenticity and natural hospitality, which are essentially rooted in a community sense and values of well-being, its production is labor-intensive with exhaustive demands on emotional labor (shani et al. ; sönmeza et al. ). this paradoxical ambiguity has epitomized much of the progress, pitfalls, and perils of caribbean tourism over the past century (duval ), and underscores the significance of reframing and extending contemporary over-tourism from an emic political-economic perspective. the confluence of policy and market failures intensify the negative externalities due to several institutional conditions, including: (a) a regulatory deficiency in environmental conservation and enforcement, (b) limited economic diversification and innovation, (c) lopsided (private) benefits and (public) costs of tourism growth, (d) marginal social inclusion and non-civic participation in tourism policy and development, and (e) a strong and persistent bias towards short-term tourism promotion, expansion, and growth (bishop ; daye et al. furthermore, the systemic exclusion and disempowerment of the community in tourism decision-making and development is central to the problematique of overtourism and one of the primary institutional root causes for the uncontrolled and excessive growth and expansion in tourism. previous studies indicate that inclusive tourism and community involvement are predominantly inhibited by institutional legacies of centralized governance, market-based tourism institutions, coercive or symbolic community consultations, information asymmetries, and structural deficits in labor and environmental regulation, tourism governance, and community education (cole ; joppe ; giampiccoli and saayman ; moscardo ; peterson et al. ; scheyvens and biddulph ) . beyond the normative value systems of neoliberal tourism policies, williams & ponsford ( ) argue that public institutions and agents tend to circumvent regulations and regulatory enforcement largely due to the economic lock-in of the tourism industry. hall and williams ( ) describe this tourism lock-in as path dependency, which is conducive to institutional failures (e.g., close personal and political ties, and resource dependency), network failures (e.g., information asymmetry, dissonance, and ignorance of new developments), and capability failures (e.g., lack of shared awareness and institutional learning capabilities). in addition to rent-seeking institutional behaviors, tourism institutions are an outcome of political negotiations and choices, which are shaped by the political agency and exercise of preferences and power (hall and williams ) , thereby increasing the risks of institutional capture, which isbeyond tourism density and intensityintegral to deconstructing over-tourism in contemporary island perspective. institutional capture occurs when an institution or a (minority) collective of agents acts to secure the commercial or political interests of a specific industry or constituency over and above the general interest of, e.g., the public or the community (acemoglu & robinson ) . this wide-spread phenomenon in the caribbean (bishop ; duval ; daye et al. ; mcelroy and de albuquerque ; pattulo ; sheller ) frequently leads to a net loss for society and prompts heightened income inequality and social exclusion over the long run (peterson et al. ; stiglitz ) . moscardo ( ) indicates that the social exclusion and the erosion of social capital stem largely from the lack of community involvement, capacity, and advocacy in tourism development, and is one of the main reasons for the growing critique of contemporary tourism, including its adverse impacts. moreover, free-market tourism policies that seek to grow the industry and increase economic returns in the short run, rarely consider the economic leakages or the adverse downstream implications for society in the long term (dodds and butler ) . in stretching the concept of over-tourism and underscoring the importance of institutional failures, dodds and butler ( ) conclude that a key political economic enabler of over-tourism is the mutually reinforcing attitude, or mindset, of both private and public sector stakeholders that favor short-term growth above all else, which subsequently nurtures institutional bonding and 'blindness' (hall and williams ) . although over-tourism is complex, the fundamental lack of tourism governance and policy prudence by public and private stakeholders, in addition to excluding civic society and local communities from tourism decisions and development, have fostered the unrestricted expansion and negative externalities of tourism growth. understanding the role of tourism institutions and institutional regimes in maintaining a certain political economic equilibrium (acemoglu & robinson ) is essential in not only understanding the evolution of over-tourism, but more importantly, fundamental to safeguarding of community well-being for strengthening the resilience of sites. consequently, this study addresses the institutional problematique of caribbean over-tourism from a historical and contemporary small island perspective. more specifically, the investigation answers the following questions within the delimited context of a caribbean site: what are the main political economic forces and institutional regimes that influence the propensity for over-tourism? how can the socio-ecological risks of over-tourism be mitigated, and community well-being be fostered for strengthening the resilience of small island tourism development? the aim of the in-depth case study is to explore the main antecedents, processes, and effects of over-tourism, and contribute to a more comprehensive and contextualized understanding of the complexity and dynamics of over-tourism within the context of a contemporary small island community in the caribbean. considering aruba's politically dependent status and the extensive tourism history in addition to its significant tourism density, intensity, and dependency, the case of aruba is examined to understand how tourism and the impacts thereof have evolved from a historical political economic perspective. aruba is a mature small island tourism economy with over years of experience with tourism. from a tourism destination perspective, it is the 'one happy island' in the caribbean (aruba tourism authority ). today, aruba is ranked amongst the top caribbean tourism destinations (wttc ) and is considered one of the most trade and tourism-dependent small island economies in the caribbean, which makes it highly vulnerable to external shocks (peterson ) . descriptive analysis of stylized indicators shows that aruba is the most tourism-dependent economy in the caribbean ( % of gdp) and ranks second and third (after st. maarten and bermuda) on, respectively, resort density (+ hotel rooms per km ) and tourism density (+ , visitors per km ) (unwto ; world bank ; imf ). the case study draws on a mixed method of qualitative and quantitative methods to examine the political economic and socio-ecological dynamics of over-tourism from a holistic small island perspective. based on an extensive historical economic survey of various social, economic, and environmental policies and indicators, the focal drivers and impacts of over-tourism are identified. case study research involves a detailed empirical inquiry that investigates a contemporary phenomenon within its real-life context. the main purpose of a case study is to provide a contextual analysis of the historical conditions and contemporary processes involved in the phenomenon under study. the embedded case studyconsisting of both quantitative and qualitative datais an appropriate research methodology in order to explore new multi-faceted concepts with limited empirical evidence (yin ). although conceptually rooted in previous studies, the political economic and socio-ecological dynamics of over-tourism remains under-theorized and under-scrutinized in empirical research, especially in nonsovereign caribbean sites. the case study combines qualitative and quantitative data analysis in a phased approached, in which the collection and analysis of quantitative economic data is followed by the collection and analysis of qualitative data in order to provide contextual richness (creswell and creswell ) . the validity of research findings is thus strengthened by means of triangulation of theoretical conceptualizations, qualitative data, and quantitative data. however, case study findings are regularly more geared at theory building rather than theory testing (eisenhardt ) . therefore, the results of this study should not be generalized beyond contemporary caribbean sites and should be interpreted from a theory development perspective. in the first qualitative phase of the study, the research focused on collecting and analyzing relevant tourism and non-tourism policy documents that were published between and . both historical and national archives were consulted to retrieve a list of national policy documents and tourism plans pertaining to different areas of national development, including, the economy, society, labor markets, education, the environment, and tourism (see references for a complete list of documents sourced and analyzed). subsequently, in the spirit of grounded theoretical logic (glaser ) and using semi-structured qualitative coding and memoing, the policy documents were analyzed for types of tourism policies, aims, institutional responsibilities, recommendations, and execution. for each document, key elements were coded with reference to, e.g., specific national context, tourism environment, policy decisions, challenges, and development options. cluster-and pattern-matching techniques were applied to identify and inductively design a hierarchy of themes and sub-themes from the qualitative data and compare these with the (theoretical) antecedents and effects of over-tourism (yin ). thus, by applying qualitative data analysis, a contextualized tourism framework was inductively constructed based on major themes and historical events across time and space (see table ). the quantitative survey consisted of a historical economic survey. data was collected by examining available and accessible statistical databases from national tourism and non-tourism authorities involving the central bureau of statistics, the department of labor, the department of nature and environment, the aruba tourism authority, the department of economic affairs, and the central bank of aruba. national databases and archives were surveyed to collect data on economic, tourism, labor, social, environmental, and other historical indicators tourism (see references for a complete list of documents sourced and analyzed). to identify the main antecedents and effects of over-tourism, the data was standardized across several indicators in order to normalize the data and facilitate comparative and inferential analyses. the data was standardized by using a min-max scaling method: y = (x-minx) / (maxx-minx), where x is the original value and y is the normalized value between [ , ]. subsequently, to explore the latent (theoretical) structure of the over-tourism construct, in addition to reducing the number of individual variables and the potential multicollinearity, a principal component panel regression (pcpr) analysisa special form of partial least square regressionwas applied, in which the over-tourism construct was regressed on the newly identified components. in examining the main antecedents of over-tourism, a production-like function was applied consisting of the identified regressors, i.e., the over-tourism antecedent factors. the production function form was estimated as a log-linear relationship using: ln(y) = a + Σailn(oti) + ε, with y = over-tourism, ot = antecedent factors, and a = coefficients. to analyze the dynamic (non-linear) effect of over-tourism, quadratic regression analysis was conducted to assess the effect of over-tourism on several socioeconomic and socio-ecological indicators. based on the parameter estimates of the quadratic regression function, the over-tourism vertexthe threshold or tipping pointwas calculated and subsequently compared to the state of over-tourism. the birth of the aruban tourism industry dates to 's when the first commercial airline landed on aruba and a guest house was opened in oranjestad, the capital of aruba. the official opening of the aruba caribbean hotel as the first luxury resort of aruba ushered in a new epoch in the development of tourism in the early 's. with the early dawning of the tourism industry, the government of aruba was keen on developing and institutionalizing a new economic pillar (besides the oil refining industry), and a national vision emerged to become 'the little miami of the caribbean', reminiscent of the vibrant economic scene in cuba during the 's. in , the aruba tourism commission (atc) was established and, subsequently, transformed into the aruba tourism bureau (atb) in close collaboration with the local tourism industry and international partners. at the time, aruba accommodated an estimated visitors and three ( ) hotels of approximately rooms. in february , the first cruise ship with visitors entered aruba's harbor. in , the aruba hotel association (aha) was founded and later changed to aruba hotel and tourism association (ahata), thereby incorporating several tourism and hospitality sectors, and solidifying the foundation of the 'one happy island'. since the early days of tourism innovation, the government of aruba pursued a free-market, export oriented economic policy (undp ) . ever since, the island's tourism policy has been geared at attracting large and mid-scale resorts, and incentivizing international investments (e.g., tax holidays, government guarantees, special concessions, dedicated facilities) for constructing luxury, timeshare, and condominium resorts, expanding air and cruise port infrastructures, and fueling several tourism-related facilities and amenities, such as casino's and golf parks (cole ) . this tourism growth policy orientation still holds today with a keen focus on fostering a high-end (quality) tourism destination by attracting affluent tourism, promoting airlift and cruise calls, increasing tourism receipts, upgrading product quality, and safeguarding the aruban visitor experience (ata ). during the 's, aruba witnessed a strong expansion of international tourism, which received an extra push with the closing of the oil refinery during the late s. to reinvigorate the economy, the government agency atb was tasked with opening new markets, increasing airlift, strengthening promotion, and was renamed the aruban tourism authority (ata). during this period of bolstering tourism growth, a new tourism master plan was developed with the aim of building a 'tourism corridor' (in the north-west region and capital of aruba) and improve the quality of tourism in order to increase tourism income and revenues. given the rich environmental and social endowments of the island, tourism was an almost instinctive and natural course of action, which accelerated with continued investments, growth, and expansions during the 's and well into the 's. by the early 's, aruba was well on its way on becoming 'little miami' and there were clear signs of an overheating economy with significant economic growth and inflationary pressures. in the article iv mission, the imf reported: "earlier growth rates, based primarily on rapid expansions of the tourism sector, had given rise to a significant population increase, surging housing demand, and incipient inflationary pressures, and a government decision to limit new construction, given nearly full employment and the island's limited physical resources". economic growth in aruba will depend on increased capacity utilization, moves to further upgrade the quality of tourism, and the diversification of the economy." (imf , p. ) consequently, there were increasing calls to halt hotel constructions and further expansions, and diversify the economy with quality tourism and other high-tech services. in its country cooperation program, the undp concluded: "…its medium term development objective is to sustain economic growth by restoring the domestic balance of savings and investment and the external balance of trade, to maintain price stability, and to diversify the economy by creating an environment that fosters private sector investment in sectors other than tourism. in collaboration with the world tourism organization, undp will conduct studies on how to improve the quality of tourism in aruba. this will include a study on the demands of high-income tourists and of aruba's comparative advantage as a destination point in the international financial and convention markets, which includes the definition of requirements for high-technology and other services." (undp , p. ) . in terms of social development and environmental resource management, the undp ( ) also indicated that more policy efforts were required for strengthening income equality, labor force participation, human resource development in order to have "the population participate more equitably in the benefits of the economic expansion of the past nine years" (undp ) . with reference to the environment, the undp indicated: "the predominant role of tourism and of the oil industry in the economy of aruba make imperative the careful management of natural resources based on proper environmental planning. undp will assist the government in preparing an environmental master plan, based on an assessment carried out by the united nations environment programme in february . undp will assist the government in building up national capacity in physical planning and environmental management." (undp , p. ) although these plans never crystalized, several years later in , the national tourism council (ntc) was institutionalized by the government of aruba and tasked with the development of a new tourism master plan for the future of aruba. accordingly, the ntc ( ): "…embodied the commitment to the industry to involve both the public and the private sectors in working together to implement strategic changes in aruba's development, and to build and maintain the island's future as a prime caribbean destination". (ntc , p. ) the ntc involved different destination stakeholders and (public and private) institutions, and identified four key tourism challenges, including (a) the sustainable development for culture, heritage, and environment, (b) quality assurance, standards, and training needs, (c) safety and security for residents and tourists, and (d) the development of a collaborative public-private sector partnership. in recognition of the rapid tourism growth after and consistent with the previous conclusions by the undp ( ) , the ntc ( ) indicated: "…the concerns expressed […] are of relatively longstanding in aruba. others have arisen in the aftermath of the threefold expansion of tourism in the late s. the latter was accompanied by high levels of new immigration to the island and the accelerated growth of population, leading to urban sprawl, and increasing erosion of the island's natural landscapeprimarily adjacent to the island's tourism corridor and oranjestad. aruba already has one of the highest densities of tourism and population in the caribbean. although questions of sustainability are often relegated to 'over the horizon', the fact that the island is so small, and the pace of growth so fast, requires that long term constraintsthose that are likely to greatly impact present residents and their childrenare identified and incorporated into the framework". (ntc , p. ) void of implementation and realization, many of these ntc concerns and thoughts resonated strongly in the subsequent national plan for sustainable development 'nos aruba ' (in english: our aruba ) that was developed during the late 's by means of a national community-based participatory planning process. several thousands of residents of different backgrounds and affiliations were actively involved by means of an appreciative inquiry process through which aspirations and pathways for sustainable development were collaboratively designed. the national plan, however, was never adopted and largely abandoned by after the government and parliamentary elections in . in , the ata was privatized as an independent entity ('sui generis') and tax reforms were introduced, including the re-allocation of room tax revenues (previously government tax revenues) to finance the ata's institutional and promotional activities in developing full-fledged and self-sufficient tourism destination marketing and management organization. furthermore, the aruba tourism product fund was established to expand tourism infrastructures. in , the government of aruba presented a strategic plan for the development of tourism echoing many of the sentiments presented in previous master plans: "the objective of this new plan is to maintain and improve aruba's competitive position in the caribbean in the short, medium to long term. additionally, this strategic plan will assist aruba in generating sustainable growth in the tourism sector in a socially acceptable, environmentally sound and economically viable manner. the outcome of the plan will allow the government and the private sector to make clear policy choices and commit to a development strategy for tourism, while securing and enhancing the quality of life of its citizens. the plan is needed for several reasons. while it is known that tourism will continue to play a critical role in the economy of aruba, the question lies if the current model of tourism specialization that has been used in the past will work in the future. additionally, changing demographics of tourists have strong implications in understanding the visitor experience to aruba." (government of aruba , p. ) today, the aruban economy is almost exclusively dependent on tourism as its main economic activity and income, with more than % of gdp generated directly and indirectly from tourism. annually, the mature and highly tourism-dependent economy generates an estimated us$ billion from more than two ( ) million stay-over and cruise visitors, with one of the highest tourism intensity and density ratios in the caribbean. the tourism industry employs well over one third of the workforce and is a significant source of tax revenues, surging after institutional privatization (in ) by an estimated % to well over an estimated us$ million per year (cba ). in reflecting on the fundamental changes in travel and tourism that have transpired over the past decade, and in consultation with community stakeholders, the ata in their aruba destination development plan ( ) concludes: "these transformations, coupled with the fact that aruba's tourism product has reached a state of maturity, means it was time to reflect on what it has achieved over the past decades and look ahead to where it wants to go in the future. a small island destination greatly dependent on tourism requires a forwardthinking destination development plan. this will: • fortify its foundation to continue to thrive as a tourism destination for the years and generations to come. • help balance external factors and conserve its scarce and fragile natural resources. • safeguard the aruba way of life and happiness of its people. "the aruba hotel & tourism association advocates for controlled growth and for the government to work urgently with the private sector on a plan for the increased need for staff (which should include a well thought-through migration plan), education, the effects on all infrastructure, and healthcare, etc. committees are being formed to address these issues in time. we also need to make sure we manage where growth takes place and the location of activities and attractions, to avoid saturation of high-traffic areas. it is and should be our highest priority to ensure that the visitor experience remains enjoyable and return-worthy" (dobson , p. ). whereas 'the one happy island' brand may conjure up images of socioeconomic wellbeing, initial investigations indicate that tourism may have reached or surpassed its optimum growth. in the past, several policy notes and studies have questioned how far and fast tourism can and should expand in aruba (cole and razak ; government of aruba ; imf ; ntc ; peterson ; undp ) . more recently, studies suggest that aruba is experiencing a 'tourism exhaustion' effect (imf ), in which tourism growth is no longer delivering value-added with diminishing economic returns. research shows that social and ecological disparities have increased in aruba and the community is experiencing significant negative tourism impacts, including growing concerns over environmental pollution and destruction, the loss of quality of life and income equality, in addition to over-construction and crowding, which cumulatively have resulted in a growing animosity toward tourism and further tourism growth (peterson et al. ) . in similar vein, the government of aruba in their economic policy 'a strong and resilient economy - ' concludes: "we are at the crossroad of important decisions related to the carrying capacity of the island, to balance the need to protect the environment and to create new economic development. the expected increase in the room inventory will lead to more demand for low skilled labor and low salaries. the question remains if the internal labor market is able to absorb this extra demand for low skilled labor without the import of foreign labor. this influx of foreign labor will put extra pressure not only on the labor market, but also on the housing sector, educational system and health care. given the expected room expansion and its consequences on the livability on the island, the experience of the visitors could be negatively affected and putting at risk not only the whole tourist industry but also the economic development of aruba." (government of aruba , p. ). over the past two decades, aruba's economic growth was mainly driven by tourism and ancillary industries, including restaurant services, real estate, and construction. tourism dominates both export and import services (+ %), and foreign-direct investments are mainly driven by tourism and real-estate investments originating from north america (+ %). total visitors have doubled in less than years, with tourism labor immigration and population density growing significantly by the turn of the century. the total amount of visitors per capita (tourism intensity) is currently estimated at (up by . since ) with a tourism density of well over , visitors per km . whereas the growth in stay-over visitors dominated between and , total cruise visitors and the cruise intensity (cruise-to-stayover visitor ratio) have surged over the past two decades (see fig. ). from an international tourism demand perspective, the segmentation of tourism (origin) markets also witnessed a marked shift over time. whereas the north american visitor market has always dominated international tourism demand (+ %), analysis indicates that since , aruba experienced at least two distinct phases of relative expansion and contraction in the segmentation of international tourism markets (see fig. ). the results suggest that the segmentation of international tourism demand is cyclical in nature and follows global economic shifts and shocks (i.e., economic cycles, economic recessions, and economic crisis). between and , international tourism market index (tmi) rose from . to . higher degree of concentration -, followed by a contraction from . to . lower degree of concentrationbetween and . over the past five ( ) years, the itmi rebounded from . to . , its highest level (of international tourism demand concentration) over the past two decades. the international tourism demand cycle is largely explained by two complimentary economic forces and cycles across the north american and the latin american tourism markets (see fig. ). whereas the first tourism market concentration phase was largely driven by the simultaneous expansion in the north american tourism market (+ . %) and the contraction in the latin american tourism market (− . %), the second tourism market diversification phase was caused by a significant expansion in the latin american market (+ %). during this second phase, growth in the north american tourism demand contracted (− . %) and the european markets grew ( . %). the recent rebound in the international tourism demand cycle was primarily driven by the collapse of the latin american market (− . %; venezuela in particular), and the strengthening of economic conditions in north america (+ . %; especially in the usa). analysis indicates that while tourism market concentration is positively international journal of community well-being associated with higher tourism intensity (β = . ; p < . ), alternatively, the diversification of tourism markets engenders relatively less tourism intensity (β = − . ; p < . ). the findings suggest that not only does surging international tourism demand fuel over-tourism, but more importantly, that the (geographic) segmentation and (stayover vs. cruise) nature of international tourism demand are significant forces that generate over-tourism in select caribbean sites. an unrestrictive principal component analysis with kaiser normalization and varimax rotation was conducted to identify the main constructs of over-tourism (see table ). the analysis yielded five ( ) components with satisfactory loadings (>. ), acceptable adequacy (kmo > . ; sphericity < . ), and reliability (cronbach α > . ) for an exploratory case study. consistent with previous studies, the findings indicate that the status of over-tourism component incorporates tourism intensity, tourism density, and tourism dependency, reflecting the volume, concentration, and contribution of tourism, respectively. three independent constructsantecedents of over-tourismwere identified, i.e., tourism supply chain, tourism architectural style, and tourism export specialization. whereas the tourism supply chain component describes the supply chain effect of the growth in airlift, accommodations, and labor, the tourism architectural style component describes the spatial concentration and design of cruise and accommodation infrastructures in a specific geographic area or coastal zone. the tourism export specialization component describes the outward (export-led) economic orientation and tourism specialization focused on, e.g., tourism investments and expansion, export earnings and revenues, and promotion. the tourism ecological stress component consists of coastal resort stress, coastal visitor stress, and coastal pollution, and is an indirect effect of over-tourism. it describes the stressors and pressures from land-and marine-based tourism activities in (concentrated) coastal areas, which are conducive to ecological decay and coastal erosion. regression analysis was conducted on the state of over-tourism and the previously identified components (see table ). a one-year over-tourism time-lag was included as a control variable. in addition, the degree of international tourism segmentation was used as a proxy indicator for the effect of international tourism demand. the results indicate that all four ( ) components are significantly related to the state and development of over-tourism in aruba (adjusted r = . ; p < . ). a positive relationship is found for tourism supply chain (β = . ; p < . ), tourism architectural style (β = . ; p < . ), and tourism export specialization (β = . ; p < . ). conversely, international tourism market segmentation is negatively associated with the state of over-tourism (β = −. ; p < . ), i.e., higher tourism market diversification is associated with less intense over-tourism. in general, the results are in line with previous studies and suggest that multiple supply and demand forces shape the propensity for over-tourism in aruba. more importantly, the results show there is an aggregate effect of interdependent domestic institutional factors that fuel the overrun of tourism. in terms of the economic contribution of tourism, the findings indicate that whereas nominal tourism service exports experienced significant growth over the past years (+ . % per year), average real tourism receipts growth diminished between and (− . % per year). since , the price index for the domestic tourism industry surged by an estimated % (cbs ). analysis indicates that increasing levels of over-tourism between and are negatively associated with real tourism receipts per visitor (β = − . ; adjusted r = . ; p < . ). the findings show that over-tourism has a negative impact on real tourism receipts per visitor, with a tourism overshootbeyond the over-tourism vertexof + . visitors (see table ). although negative, the findings indicate that over-tourism has no significant bearing on real tourism receipts per capita (β = − . ; adjusted r = . ; p > . ), which decreased by % between and . from an economic perspective, the findings suggest diminishing marginal returns from tourism specialization and growth after . similarly, after experiencing a significant tourism boost during the s, aruba's economy stagnated with a structural weakening of real growth (from . % to . %) over the past decade. the long run real economic growth is currently projected at . % (cba ). the overdependency on tourism exports is also demonstrated by the increasing output volatility from . % to . % between and . the results suggest that over-tourism is a significant source of rising output volatility (β = . ; adjusted r = . ; p < . ). conversely, available visitor satisfaction survey data (cbs a, b, c) reveals that over-tourism is negatively correlated with visitor satisfaction and perceived quality of tourism services. the findings suggest that visitors are increasingly dissatisfied with the destination's cleanliness (− . %), hospitality and friendliness (− . %), and local transportation (− . %), which may explain the decline in real tourism receipts growth and be indicative of the negative effects and tourism spillovers of environmental pollution, traffic congestion, and workforce exhaustion; on average, there are visitors for every workforce employee. since the s, aruba's population expanded largely due to several industrial waves of labor immigration related to, respectively, the oil refining industry and the tourism industry. it is estimated that at least % of the population is foreign-born, with tourism immigration remittances close to % of gdp (cba ). over the past five decades, the working age population surged with employment more than doubling and largely concentrated (+ %) in five ( ) sectors: tourism services, wholesale and retail, real estate and renting, construction, and public services. however, despite labor force expansions, labor productivity and labor participation rates have deteriorated significantly over the past decade, dropping by an estimated percentage points from % to % (cbs ). consistent with the decline in real tourism receipts per capita and real gdp per capita, labor productivity also regressed between and . regression analysis indicates that over-tourism has a negative impact on labor force participation and a tourism overshoot of + . beyond the over-tourism vertex (see table ). analysis shows that tourism labor wages lag average median wages by at least % for almost a decade. income inequality as measured by the gini coefficient rose from . to . between and , indicating a relative deterioration of income equality. the findings show that over-tourism has a significant impact on income inequality (β = . ; adjusted r = . ; p < . ). real wages have remained stagnant across income distribution for over a decade, which corroborates the earlier finding on stagnant real gdp per capita growth. the level of vulnerable employment, measured by the relative poverty threshold of % of the median income, deteriorated between and , especially in the hotel, restaurant, and construction industries (cbs ). moreover, an estimated % of households are financially overleveraged and indebted (cba ). regression analysis indicates that over-tourism has a negative impact on income equality and a tourism overshoot of + . (see table ). in confronting the increasing levels and challenges of social disparities and inequality, the government of aruba launched a special 'social crisis' program in (government of aruba ), yet faced challenges in financing the program due to limited fiscal space; the debt-to-gdp ratio rose from % to an estimated % between and (cba ). from a tourism perspective, the tourism exports-to-debt ratio is projected at %, which signals significant fiscal vulnerability. thus, despite significant tourism investments and growth over the past decade, fiscal space and social wellbeing have steadily deteriorated. furthermore, analysis shows that tourism growth and intensity are positively associated with government expenditures between and in aruba. the results suggest that over % of the surge in government expenditures over the past two decades is (indirectly) related to over-tourism (β = . ; adjusted r = . , p < . ). these cost effects ensue, however, with a time lag of at least five ( ) years. thus, while the (tax revenue) benefits of tourism growth are recorded within fiscal years, the (government) expenditures of over-tourism emerge over an expanded period of time, oftentimes accumulating over generations and spanning several government cycles. in the case of aruba, negative fiscal externalities of over-tourism emerged after the turn of the century and continue to 'slow burn' an already restrained fiscal space and limiting fiscal buffers against external economic shocks and internal social vulnerabilities. furthermore, the re-allocation of room tax revenuesfrom government tax income to tourism tax incomealso weakened the fiscal budget after . recent evidence suggest that over-tourism may also have an indirect long-term impact on the cost of doing business and cost of living due to relatively high levels of employers' social contribution and (core) inflation, which gradually arise from the need to recover 'hidden' fiscal costs and minimize (structural) fiscal deficits (cba ). with reference to aruba's tourism infrastructure, the tourism industry is largely clustered along the northwest coastline with an estimated rooms per km ; a significant expansion (+ %) in less than two decades (see fig. ). although a small island, other geographic districts in aruba remained relatively void of (large scale) tourism-related activities and infrastructures between and . the tourism industry zone was established during the late 's and represents an estimated % of the island's total geography. today, there are an estimated , accommodation units, including hotel and time share resorts in addition to condominiums, villas, and guest houses as well as (more geographically dispersed) rental apartments. whereas the expansion in hotel and time share rooms drove the surge in accommodations up until the early 's, over the past decade, accommodation growth was largely spurred by the construction of condominiums, guesthouses, and rental apartments (+ %), which increased the (pre-existing) infrastructural pressures and coastal resort density levels. regression analysis was conducted to assess the relationship between the state of over-tourism and tourism ecological stress. the findings show a significant curve-linear relationship between the state of over-tourism and tourism ecological stress (β = . , adjusted r = . , p < . ). this concave relationship indicates that as the intensity and density of tourism increases, the ecological pressures grow and, more importantly, accelerate after exceeding a critical threshold. over-tourism is associated with significant ecological pressures and a tourism overshoot of + . beyond the over-tourism vertex (see table ). these findings suggest that over-tourism is partially responsible for the structural decay and loss of ecological services, which is currently valued at an estimated % of gdp. although regional zoning and marine conservation plans were adopted in , aruba's natural habitats and marine environment have remained unprotected for well over years since the exploitation of the phosphate, gold, and oil refining industry during the 's, and the subsequent construction and expansion of large scale tourism infrastructures and urbanization since the late 's (cbs a (cbs , b (cbs , c . these enduring ecological pressures are also intertwined with changes in climate and nature. in terms of temperature, available evidence indicates that the sea water temperature has slowly risen over the past years. it is estimated that sea water temperatures have increased with at least + . c since the 's (cbs a (cbs , b (cbs , c . available energy consumption records between and indicate that the effect of rising average temperatures is also reflected in the increase of average energy consumption per household (adjusted r = . ; p < . ) and the rise in relative household energy consumption expenses from . % (in ) to . % (in ). despite the increasing renewable energy production (+ %), household energy consumption has also risen, which suggests that persistent household energy consumption behaviors are at play in aruba. further analysis found no significant relationship between increasing temperatures and real gdp per capita (p > . ). available data suggests that the slow burn effect of temperature rising is reflected in increasing incidences of coral bleaching along aruba's coastal reefs over the past decade. in combination with the structural deforestation and dredging of marine and coastal ecologies (i.e., coral reefs, mangroves, palm trees, etc.) since the late 's, the slow and consistent rise in seawater temperature is likely also responsible for the loss of marine life and marine biodiversity (cbs a (cbs , b (cbs , c . moreover, although no significant historical data is available, the growing stress on the local marine ecosystem is also due to the incessant acidification of marine watersresulting from past oil spills and leakages, polluted water runoff, and the non-treated coastal disposal of wasteas well as the growing population density and surging coastal urbanization (cbs a (cbs , b (cbs , c . the emission of carbon dioxide is an additional component in the rise of temperatures, energy consumption, and environmental decay. historical archives show that between and , the carbon intensity rose sharply to well over million mt co in aruba. this significant level and surge stemmed largely from four factors, i.e., (a) an energy supply based on fossil fuels (hfo: heavy fuel oil), (b) the operation of the oil refinery, (c) the expansion in tourism infrastructures and services, and (d) the subsequent growth of the labor force, the population, and the residential urbanization. with the closure of the oil refinery in and the push for renewable energy adoption, co emissions dropped significantly (− %) by . in addition to carbon emissions, energy intensity is also a significant source of co . energy intensity emanates largely from private and public service infrastructures, including residential and commercial buildings, and business and civil services. thus, beyond energy supply, energy consumption behaviors are also integral to carbon emissions. based on an analysis of two proxy indicators for energy-based co emissions, i.e., real private consumption and energy consumption, the results show that similar to the rise in carbon intensity, the energy intensity increased significantly (+ %) in aruba. the surge in energy consumption stems largely from the growth in tourism during the s and the subsequent population expansion and urbanization. this overrun of tourism growth explains at least three quarters of the surge in carbon emissions and energy intensity (β = . ; adjusted r = . ; p < . ) over the past two decades. however, unlike the reduction in carbon intensity in recent years ( ) ( ) ( ) ( ) ( ) , energy intensity remained relatively stable. with reference to extreme weather events, analysis indicates that prior to the s hurricanes and major tropical storms would graze the island every to years. over the past five decades, the time interval between extreme weather events has shortened considerably to an estimated to years. extreme weather events over the past years coincide with major hurricanes and tropical storms in the caribbean, including e.g., joan ( ), bret ( ), lenny ( ), ivan ( ), felix ( ), omar ( ), and matthew ( . whereas aruba was not in the direct path of these hurricanes, the subsequent precipitation and storm surges caused significant flooding with average rainfalls of mm; almost treble the annual average of mm in aruba. conversely, the findings indicate the occurrence of several periods of drought after the turn of the century. in general, the results suggest an increase in climate volatility with relatively more extreme weather patternsinvolving both extreme precipitation and extreme droughtemerging over the past two decades. national census records (cbs a (cbs , b (cbs , c indicate that over the past two decades, incidents of residential and commercial flooding have indeed increased substantially (+ %). findings from spatial data analysis show that an estimated % of all households are located in coastal residential areas with density levels well over residents per km as well as at least , tourists per km (cbs a (cbs , b (cbs , c . these coastal areas are also prone to beach erosion due to their predominantly low elevation (< m) and limestone geological configuration, which is susceptible to ocean acidification, pollution, and carbon emissions. to summarize, based on the overall results of this study, the following conceptual model is proposed that captures the dynamics and flow over-tourism in terms of antecedents and effects, and their interdependencies (see fig. ) . as an initial step towards a more comprehensive theoretical framing of over-tourism (in caribbean sites), the conceptual model consists of nine different constructs depicting (a) three ( ) independent institutional drivers of over-tourism, (b) the mediating over-tourism construct, and (c) three ( ) dependent impact constructs. furthermore, two ( ) exogenous constructs capture the external shifts, shocks, and market demand that moderate the state of over-tourism. the extended theoretical framing of over-tourism proposes that there are several institutional forces that exert significant positive political-economic pressures on the propensity for over-tourism in a non-sovereign caribbean site, i.e., a neo-liberal outward-oriented tourism policy focused on export specialization, supply growth, and large-scale infrastructure expansion, which are largely based on private and political interests to the exclusion of societal values and community interests. the results corroborate previous studies and demonstrate the political and historical workings of several institutional failures and legacies involving, e.g., rent-seeking behaviors, market-driven interests, coercive community participation, and increasing structural deficits in regulation, regulatory enforcement, sound governance, participatory decision-making, and community empowerment. for over two decades, the systemic exclusion of the local community and the structural disregard for the destination's socio-ecological capabilities have led to significant policy drift, growing community this study addressed the political economy of over-tourism in a non-sovereign site and discussed a historical case study on the evolution and socio-ecological impacts of tourism growth in aruba. the aim of the in-depth case study was to explore the main antecedents, processes, and effects of over-tourism, and contribute to a more comprehensive and contextualized understanding of the complexity and dynamics of overtourism within the context of a contemporary small island community in the caribbean. while limited to a single case study, the findings provide an extended conceptualization and framing of over-tourism from a small island caribbean perspective, which emphasizes the political, institutional, and historical forces that shape tourism policies and development. in reviewing the overall findings of this study, the general results corroborate previous research on the adverse relationship between over-tourism, community wellbeing, and economic development. more specifically, the research demonstrates that an aggregate of political economic forces that shape and generate over-tourism over time and space. the results indicate that there are three main institutional factors that engender a state of over-tourism, which are primarily shaped by public and private sector growth-lead strategies. these institutional factors are tourism export specialization, tourism supply chain, and tourism architectural style. furthermore, the findings indicate that tourism market demand and diversification restrain the propensity for over-tourism. thus, both tourism supply and tourism demand actively influence and shape the evolution of over-tourism. more importantly, in stretching the concept of over-tourism to reflect the political economic role of tourism in a caribbean site and the subsequent institutional overpowerment of the destination communityde facto community exclusion and disempowerment in tourism decision-making and development -, the findings yield strong evidence that there is more to over-tourism than simply an over-shoot of tourism growth. essentially, over-tourism is a manifestation of coercive institutional powers and processes, and represents the social overpowerment or disempowerment of a destination community's agency. from a political-economic perspective, the results show how the institutional capture of tourism-centric values and voices, and a confluence of institutional, market, and policy failures have systematically and structurally shaped the growth and growing adversities confronting the local community in a nonsovereign caribbean site. in the specific case of aruba, the findings indicate that this 'one happy island' has faced substantial tourism policy drift since the development of the initial tourism master plan in the 's. despite numerous tourism policies and master plans, and the relatively consistent policy intentspanning well over five decadeson, e.g., sustainable development, quality assurance and training, safety and security, quality visitor experiences, environmental quality, social equality, and economic viability, the realization of these master plans has, nevertheless, resulted in a system of tourism specialization, supply, and style that is disconnected from society and the local community. the resulting policy ambiguity, economic disconnectedness, social inequality, and ecological degradation have engendered significant socio-ecological vulnerabilities, which have resulted in increasing community disengagement and discontent with tourism. these findings support previous studies that indicate that social inequality, ecological degradation, and economic instability are likely to coalesce. social inequality erodes community well-being and institutional trust, and engenders ecological degradation due to both environmental resource competition in addition to weakening the social foundations that underpin the requisite civic and community actions to enact and enforce environmental regulation. the resulting extractive institutions and the erosion of the socio-ecology nurtures the concentration of resources and power, thereby generating economic instability, volatility, and more fragile community development to the detriment of community well-being. although restricted to a single in-depth case with limited generalizability, the extended conceptualization and caribbean framing of over-tourism provide several avenues for future research. an investigation of multiple case studies across the caribbean, especially in sovereign and non-sovereign sites, would provide further evidence and validation for the institutional forces that shape over-tourism, and the potential adverse impacts on community well-being. likewise, it would be relevant to scrutinize these propositions from a continental, metropolitan or city perspective, particularly as the rise of over-tourism is a global phenomenon and certainly not confined to the caribbean. moreover, contingent upon sufficient reliable data, the application of econometric analysis would provide further empirical scrutiny of the proposes conceptual model and hypotheses, especially in terms of reverse causality. future research could also consider examining a possible self-reinforcing effecta positive feedback loopof over-tourism on institutional forces that generate an overtourism path dependency, i.e., rising levels of tourism intensity beget increased tourism specialization and tourism supply expansion, akin to a tourism 'lock-in' effect or 'entrapment' due to past tourism performance and success. this institutional capture stimulates tourism myopia and a short-term tourism growth orientation, which causes a gradual tourism overshoot of socio-ecological ceilings with significant costs in the medium to long term. because these costs and negative spillover effects remain relatively concealed for an extended period of timebeyond political economic cycles -, they are often not considered or disregarded in tourism governance and policies, thus setting the scene for community disenfranchisement and future disasters. acknowledging the delimitations of this study, the findings hold several policy implications for strengthening the resilience of caribbean sites in the wake of overtourism and community overpowerment. first and foremost, the findings testify to the contextual, historical, and institutional nature of over-tourism, and more importantly, to the need to address social, political, and ecological developments explicitly, extensively, and urgently in national policies and institutional arrangements, especially in nonsovereign and tourism-dependent small island economies in the caribbean. rather than simply devise buffers for absorbing shocks and bounce back to previous dependent paths of economic and social decay, fostering the resilience of caribbean sites requires building institutional capabilities to anticipate shocks, to adapt and learn, and bounce forward towards new pathways of development. there is, however, no silver bullet solution to the challenges of over-tourism. one of the biggest risks to caribbean sites is underestimating the adverse effects of over-tourism and downplaying the risks of climate change, i.e., tourism myopia compounded by willful ignorance. resolving the challenge of over-tourism requiresfirst and foremostan acknowledgement of the societal costs and moral risks associated with the uncontrolled and unregulated expansion of tourism; especially when considering the many policy studies and recommendations that have been produced and published over the past years, in addition to the increasing impact of climate change. while some caribbean sites may have been able to afford some degree of policy drift and institutional inertia in the past, today climate change and related anthropogenic challenges are demanding nothing less than a fundamental rethink and redirection. echoing the sentiments of the united nations more than a decade ago, if caribbean sites countries fail to adapt, they are likely to take direct and substantial hits with detrimental repercussions for lives and livelihoods. more specifically, there is an increasingly urgent need to transition away from traditional, uncontrolled, and exploitive tourism growth fetishes towards transformative, resilient, and inclusive development. this structural transformation reflects the redesign of constricted, short-term, market-based, and elitist policies, which is based on the resourcefulness and values of a destination's community and citizens. the structural transformation describes the transition towards a community-driven development by building capabilities for innovation, internalization, and institutionalization for strengthening the resilience of small island tourism development in the caribbean (see table ). pathways for innovation are primarily intended to mitigate the community and climate risks of tourism specialization, single-market supply dependencies, and expansive infrastructure and hotel construction footprints; in effect, the main drivers of overtourism (see independent constructs in fig. ) . policy recommendations include the diversification of the economy and tourism markets, in addition to developing new community-driven and culturally-authentic niche products, experiential services, and micro-businesses. with its rich history and culture in cuisine representing well over nationalities as well as an embryonic caribbean artistic scene, culture and creativity would provide a new opportunity for local entrepreneurs and the community in the specific case of aruba. likewise, wellness and health as an extension for organic foods and pristine ecology, would provide a viable avenue to explore by the community and social entrepreneurs. this would also stimulate the strengthening of domestic agri-food chains as well as the adoption of circular business models to improve resource efficiencies (for, e.g., energy, water, waste, land). likewise, reskilling of the workforce and the creation of new professional education programs would engender a knowledgebased, community-driven generation of tourism entrepreneurs, professionals, and policymakers. whereas innovation pathways are necessary, they are, however, insufficient for mitigating the socio-ecological risks of over-tourism. hereto, the costs and adverse impacts of over-tourism would need to be incorporated and 'internalized' into tourism governance and development. to complement the conventional 'outward-oriented' tourism policies and (growth) metrics, public officials and tourism authorities need to focus on 'internal-oriented' community development and commensurate (socioecological) metrics (see dependent constructs in fig. ) . the pathways for internalization describe the explicit recognition, interrelationships, and resourcefulness of a destination's community. rather than focus solely on traditional metrics of tourism growth, internalized tourism policies integrate metrics and indicators for social equality establish and develop a national data system for the structural measurement and monitoring of environmental quality (including marine biodiversity) and social equality (including income inequality, community well-being) to inform tourism decision-making and development (see also national tourism council and national council for climate change). set up community-based and citizen-driven tourism observatories to monitor visitor behaviors and community sentiments (especially in 'tourism hotspots'). promote and establish a 'whole of government' platform to fully and transparently engage civic society, community stakeholders, public and private sector. (re-) allocate and earmark tourism tax revenues to national and community education programs. develop and implement community advocacy programs to build and strengthen awareness of tourism opportunities and vulnerabilities as well engage and empower civic society in national and regional tourism development. stimulate regional and rural community tourism cooperatives with local representatives and stakeholders to involve and engage citizens in community-driven tourism development. strengthen institutional capabilities in public-sector and tourism authorities for an inclusive and sustainable development of tourism at the regional and local level. implement environmental regulation and regulatory enforcement of coastal zone conservation and fragile habitats regeneration (spatial zoning for residential and commercial development). establish national council and policies for climate change resilience and adopt new legislation for infrastructure and building codes. incorporate climate change readiness and resilience measures in fiscal, economic, and social policies, and community programs. integrate environmental and energy measures into incentives and investment programs to reduce and recycle waste and increase resource efficiency (e.g. energy, water, land, infrastructures). labor regulation and regulatory enforcement of (minimum) wages, income equality, and labor force participation (non-informal labor). labor market flexibilization should be pursued to foster broader workforce participation and mobility, especially for young(er) professionals. regulate accommodation supply, especially of large-scale tourism infrastructures in coastal-sensitive and climate change regional zones. destination stewardship and establishment of a national tourism council with statutory mandates and inclusion of civic society, community representatives, and non-governmental organizations and environmental integrity, including the well-being of society and happiness of communities (musikanski et al. ; vanderweele ) . pathways for internalization entail the establishment and usage of a national (data) system for measuring and monitoring community well-being and environmental quality in a formal, transparent, and structured manner to enable comprehensive and evidencebased tourism policies and development. furthermore, regional, community-based, and citizen-driven tourism observatories should be developed and established to monitor visitor behaviors and community sentiments (especially in 'tourism hotspots'). this would foster community engagement and ownership as well as promote a 'whole of government' platform to fully and transparently engage civic society, community stakeholders, public and private sector. thereto, tourism tax revenues should be (re-) allocated and earmarked for national and community education programs. community advocacy programs and regional tourism cooperatives should also be considered to strengthen capacity and awareness of tourism opportunities and vulnerabilities. innovation and internalization are prone to fade and fumble if not institutionalized. mitigating the risks of over-tourism and safeguarding community wellbeing, thus, depend on the structural transformation of the localized networks of power and control; de jure and de facto. notwithstanding the most comprehensive and forward-looking tourism vision, policies, and leadership, in the absence of strong institutions and governance, sustainable development of tourism will remain ephemeral and beyond the reach of caribbean sites. whereas de jure structural reforms are necessary to strengthen the resilience of caribbean sites, they are insufficient and are likely to falter when de facto political and institutional reforms are absent. this calls for an almost paradigmatic shift in values, institutions, and governance; a transition that is unlikely to happen in the absence of any significant crisis or 'creative destruction'. while the former describes continued institutional capture and path dependency, and consequently, the strengthening of existing institutional frames, relationships, and predictable outcomes (e.g., continued construction and expansion of hotel room capacity despite a global pandemic and economic recession), the latter depicts a process of creating new pathways, restructuring institutional frameworks, fostering new and diverse relations, internalizing socioecological values, and stimulating institutional learning (e.g., establishing and enforcing climate-resilient tourism infrastructure regulations, or starting an agri-tourism cooperative in collaboration with rural farmers and providing a digital platform for tracking and tracing supply and quality). pathways for institutionalization describe structural reforms and regulatory innovations that engender ethical, transparent, and inclusive governance within the rule of law and the voice of the community. the entails the rethinking and redesigning of institutional foundations and capabilities with strong codes of ethics, competence, integrity, and ingenuity. thereto, the regulation and regulatory enforcementwith sanctionsof environmental conservation, coastal zonification, and regeneration of biodiverse habitats (e.g., coral reefs, mangrove ecologies, wetlands of endemic species) are a prime directive. spatial zonification and governance are quintessential for regulatory execution and enforcement. likewise, establishing environmental oversight and a national council for climate resilience are highly recommended to firmly and legally position matters of nature and ecology on the highest national agenda. environmental and energy measures should also be integrated into incentives and investment programs to reduce and recycle waste and increase resource efficiencies (for, e.g., energy, water, land, infrastructure, and construction). from a labor and tourism workforce perspective, flexibilization, participation, equality, and mobility should be placed on the national agenda for development. the regulatory enforcement and oversight of minimum wage payments, income equality, workforce registration, and formal labor force participation are key directives for policy execution. labor market flexibilization should be pursued to foster broader workforce participation and mobility, especially for young(er) professionals and other vulnerable groups in society. lastly, the statutory establishment of destination stewardship and a national tourism council should be pursued to safeguard community well-being to strengthen the resilience of small island tourism development. hereto, commensurate institutional capabilities need to be developed with the structural involvement and inclusion of civic society, community representatives, and non-governmental organizations. to conclude, this study underscores the contextual and dynamic nature of overtourism, and more importantly, the need to address social and ecological developments explicitly, extensively, and urgently in the policies, norms, and institutional arrangements, especially in tourism-dependent small island economies in the caribbean. rather than continue on the old path of extractive and exploitive growth, caribbean sites need to forge new pathways towards inclusive tourism development and lead from an emerging future of climate change and other geopolitical challenges. although certainly not an easy feat as witnessed by the enduring and persistent policy and market failures over the past decades, if these principles and fundamentals of sound political and public governance are not cultivated and advanced, no amount of master planning will suffice. without concerted, committed, and creative institutional actions and agency to nurture the well-being of small island societies, the existential challenges of over-tourism and climate change will endure and gradually intensify in caribbean sites. quintessential is the institutionalized involvement of the community and civic society in order to strengthen the resilience of small island tourism economies in the twenty-first century. why nations fail: the origins of power, prosperity and poverty caribbean tourism in the global marketplace: trends, drivers, and challenges endogenous growth theory innovative development strategies from non-sovereign island jurisdictions? a global review of economic policies and governance practices small islands versus big cities: lessons in the political economy of small islands problems of smaller territories tourism as a small-state development strategy: pier pressure in the eastern caribbean? the political economy of small states: enduring vulnerability in the wake of the tourist. washington dc: the conservation foundation small states in a global economy tourism and environmental conservation: conflict, coexistence, or symbiosis? environmental conservation the concept of a tourism area cycle of evolution: implications for management of resources the economic growth and impact of tourism in small island developing states -evidence from the caribbean overtourism: a literature review to assess implications and future perspectives caribbean pushes for inclusive tourism development the case for responsible travel inconveniences in the home neighborhood in aruba housing and accommodation in recent decades in aruba the suburbanization of the aruban landscape income and expenditure survey annual statistical digest unleashing growth and strengthening resilience in the caribbean cultivating community wellbeing: guiding principles for research and practice impact of tourism on the physical environment beyond the resort life cycle: the micro-dynamics of destination tourism how far, and how fast? population, culture, and carrying capacity in aruba doctrines of development the social impact of tourism on developing regions and its measurement research design: qualitative, quantitative, and mixed methods approaches new perspectives in caribbean tourism tourism development and growth international tourism in the caribbean area: current status and future prospects, etudes caribeennes the future of aruba: how the tiny caribbean island is bracing for major new hotel development over-tourism: issues, realities and solutions. de gruyter studies in tourism a causation theory of visitor-resident irritants: methodology and research inferences tourism and the environment: a review of the literature and issues tourism in the caribbean. trends, development, prospects. routledge: london building theories from case study research ecology and tourism reconceptualizing tourism capacity to absorb tourism: concepts and implications or strategic planning community-based tourism development model and community participation basics of grounded theory analysis economic policy: a strong and resilient economy - destination resilience and smart tourism destinations sustainable development goals and inclusive development tourism and innovation tourism and inclusive growth in small island developing states sustainable tourism: sustaining tourism or something more? tourism management perspectives pattern and impact of tourism on the environment of the caribbean inclusive growth analytics: framework and application, policy research working paper building resilience in developing countries vulnerable to large natural disasters tourism and coastal resources degradation in the wider caribbean. a study for the united nations environment programme, caribbean environment programme the roles of policy, planning, and governance in preventing and managing over-tourism is over-tourism overused? understanding the impact of tourism in a city context the overdeveloped nations: the diseconomies of scale unleashing strong, sustainable, and inclusive growth in the caribbean the carbon footprint of global tourism on the carrying capacity and the optimal number of visitors in tourism destinations on the implications of tourism specialization and structural change in tourism destinations tourism; economic, physical and social impacts tourism carrying capacity: tempting fantasy or useful reality tourism development in small islands across the world problems for managing sustainable tourism in small islands island tourism and sustainable development: caribbean, pacific, and mediterranean experiences the advantages of political affiliation: dependent and independent small-island profiles frontiers of development economics. the future in perspective frontiers of development economics. the future in perspective over-tourism and tourismphobia: a journey through four decades of tourism development, planning and local concerns building community capacity for tourism development social capital, trust and tourism development tourism policy and planning implementation, contemporary geographies of leisure, tourism and mobility planet happiness: a proposition to address overtourism and guide responsible tourism, happiness, well-being and sustainability in world heritage sites and beyond aruban sustainable tourism framework. kingston: government of aruba tourism and environmental research: a review contemporary issues in tourism development seeking sustainability of tourism specialization in small island states institutional capabilities for island innovation fostering economic resilience: from roots to routs. central bank of aruba sustainability and resilience in caribbean tourism economies: a critical inquiry search of inclusive tourism in the caribbean: insights from aruba, worldwide hospitality and tourism themes the great transformation. the political and economic origin of our time inclusive growth: building up a concept. working paper, no. , international policy centre for inclusive growth inclusive development: two papers on conceptualization, application, and the adb perspective. asian development bank doughnut economics. seven ways to think like a st century economist the political dimensions tourism is there a caribbean sclerosis? stagnating economic growth in the caribbean. washington: inter-american development bank, country department caribbean inclusive tourism development emotional labor in the hospitality industry: the influence of contextual factors consuming the caribbean: from arawaks to zombies complexity of occupational health in the hospitality industry: dynamic simulation modeling to advance immigrant worker health executive board of the united nations development programme and of the united nations population fund united nations sustainable development goals (unsdg) overtourism? -understanding and managing urban tourism growth beyond perceptions, executive summary measures of community well-being: a template strategies for tourism in island microstates confronting tourism's environmental paradox: transitioning for sustainable tourism inclusive growth: a synthesis of findings from recent ieg evaluations economic impact research report case study research. design and methods publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations conflict of interest the author declares no conflict of interest.ethics approval and consent to participate the research in this paper did not involve human subjects and thus no consent was required. key: cord- -rlpzejjt authors: coutard, b.; valle, c.; de lamballerie, x.; canard, b.; seidah, n.g.; decroly, e. title: the spike glycoprotein of the new coronavirus -ncov contains a furin-like cleavage site absent in cov of the same clade date: - - journal: antiviral res doi: . /j.antiviral. . sha: doc_id: cord_uid: rlpzejjt in , a new coronavirus ( -ncov) infecting humans has emerged in wuhan, china. its genome has been sequenced and the genomic information promptly released. despite a high similarity with the genome sequence of sars-cov and sars-like covs, we identified a peculiar furin-like cleavage site in the spike protein of the -ncov, lacking in the other sars-like covs. in this article, we discuss the possible functional consequences of this cleavage site in the viral cycle, pathogenicity and its potential implication in the development of antivirals. in , a new coronavirus ( -ncov) infecting humans has emerged in wuhan, china. its genome has been sequenced and the genomic information promptly released. despite a high similarity with the genome sequence of sars-cov and sars-like covs, we identified a peculiar furin-like cleavage site in the spike protein of the -ncov, lacking in the other sars-like covs. in this article, we discuss the possible functional consequences of this cleavage site in the viral cycle, pathogenicity and its potential implication in the development of antivirals. human coronaviruses (cov) are enveloped positive-stranded rna viruses belonging to the order nidovirales, and are mostly responsible for upper respiratory and digestive tract infections. among them sars-cov and mers-cov that spread in and respectively, have been associated with severe human illnesses, such as severe pneumonia and bronchiolitis, and even meningitis in more vulnerable populations (de wit et al., ) . in december , a new cov ( -ncov) has been detected in the city of wuhan, and this emerging viral infection was associated with severe human respiratory disease with a~ - % fatality rate . the virus that was presumed to have initially been transmitted from an animal reservoir to humans possibly via an amplifying host. however human-to-human transmission has been reported, leading to a sustained epidemic spread with > , confirmed human infections, including > deaths, reported by the who in early february . the estimated effective reproductive number (r) value of~ . ( %: . - . ) at the beginning of the outbreak raises the possibility of a pandemics . this prompted who to declare it as a public health emergency of international concern. this is especially relevant because so far there are no specific antiviral treatments available or vaccine. based on its genome sequence, -ncov belongs to lineage b of betacoronavirus (fig. a) , which also includes the sars-cov and bat cov zxc , the latter and cov zc being the closest to -ncov. -ncov shares~ % amino acid sequence identity in the spike (s)-protein sequence with sars-cov and % with cov zxc (chan et al., ) . in this article, we focus on a specific furin-like protease recognition pattern present in the vicinity of one of the maturation sites of the s protein ( fig. b ) that may have significant functional implications for virus entry. the proprotein convertases (pcs; genes pcsks) constitute a family of nine serine secretory proteases that regulate various biological processes in both healthy and disease states (seidah and prat, ) . by proteolysis, pcs are responsible for the activation of a wide variety of precursor proteins, such as growth factors, hormones, receptors and adhesion molecules, as well as cell surface glycoproteins of infectious viruses (seidah and chretien, ) (table ) . seven pcs cleave precursor proteins at specific single or paired basic amino acids (aa) within the motif (r/k)-( x)n-(r/k)↓, where n = , , , or spacer aa (seidah and chretien, ) . because of their role in the processing of many critical cell surface proteins pcs, especially furin, have been implicated in viral infections. they have the potential to cleave specifically viral envelope glycoproteins, thereby enhancing viral fusion with host cell membranes (izaguirre, ; moulard and decroly, ) . in the case of human-infecting coronaviruses such as hcov-oc (le coupanec et al., ) , mers-cov (millet and whittaker, ) , and hku (chan et al., ) the spike protein has been demonstrated to be cleaved at an s /s cleavage site (fig. ) generating the s and s subunits. the above three viruses display the canonical (r/k)-( x)n-(r/k)↓ motif (table ) . additionally, it has been demonstrated that variation around the viral envelope glycoprotein cleavage site plays a role in cellular tropism and pathogenesis. for instance, the pathogenesis of some cov https://doi.org/ . /j.antiviral. . received february ; received in revised form february ; accepted february has been previously related to the presence of a furin-like cleavage site in the s-protein sequence. for example, the insertion of a similar cleavage site in the infectious bronchitis virus (ibv) s-protein results in higher pathogenicity, pronounced neural symptoms and neurotropism in infected chickens (cheng et al., ) . similarly, in the case of influenza virus, low-pathogenicity forms of influenza virus contain a single basic residue at the cleavage site, which is cleaved by trypsin-like proteases and the tissue distribution of the activating protease(s) typically restricts infections to the respiratory and/or intestinal organs (sun et al., ) . conversely, the highly pathogenic forms of influenza have a furin-like cleavage site cleaved by different cellular proteases, including furin, which are expressed in a wide variety of cell types allowing a widening of the cell tropism of the virus (kido et al., ) . furthermore the insertion of a multibasic motif rerrrkkr↓gl at the h n hemagglutinin ha cleavage site was likely associated with the hyper-virulence of the virus during the hong kong outbreak (claas et al., ) . this motif exhibits the critical arg at p and basic residues at p and p , as well as p and p and an aliphatic leu at p ' positions (table ) (schechter and berger nomenclature (schechter and berger, ) ), typical of a furin-like cleavage specificity (braun and sauter, ; izaguirre, ; seidah and prat, ) . the coronavirus s-protein is the structural protein responsible for the crown-like shape of the cov viral particles, from which the original name "coronavirus" was coined. the~ aa long s-protein belongs to class-i viral fusion proteins and contributes to the cell receptor binding, tissue tropism and pathogenesis (lu et al., ; millet and whittaker, ) . it contains several conserved domains and motifs table comparative sequences of envelope protein cleavage site(s) in coronaviruses (above) and in other rna viruses (below). empty boxes: no consensus motif detected.. (fig. ) . the trimetric s-protein is processed at the s /s cleavage site by host cell proteases, during infection. following cleavage, also known as priming, the protein is divided into an n-terminal s -ectodomain that recognises a cognate cell surface receptor and a c-terminal s membrane-anchored protein involved in viral entry. the sars-cov s protein contains a conserved receptor binding domain (rbd), which recognises the angiotensin-converting enzyme (ace ) (li et al., ) . the sars-cov binds to both bat and human cells, and the virus can infect both organisms (ge et al., ; kuhn et al., ) . the rbd surface of s /ace implicates aa in the s of sars-cov (li et al., ) . among them, residues are strictly conserved in -ncov, supporting the hypothesis that ace is also the receptor of the newly emerged ncov (wan et al., ) . the s -protein contains the fusion peptide (fp), a second proteolytic site (s ′), followed by an internal fusion peptide (ifp) and two heptad-repeat domains preceding the transmembrane domain (tm) (fig. ) . notably, the ifps of the -ncov and sars-cov are identical, displaying characteristics of viral fusion peptides (fig. ) . while the molecular mechanism involved in cell entry is not yet fully understood, it is likely that both fp and ifp participate in the viral entry process (lu et al., ) and thus the sprotein must likely be cleaved at both s /s and s ′ cleavage sites for virus entry. the furin-like s ′ cleavage site at kr↓sf with p and p basic residues and a p ′ hydrophobic phe (seidah and prat, ) , downstream of the ifp is identical between the -ncov and sars-cov (fig. ) . in the mers-cov and hcov-oc the s /s site is replaced by rxxr↓sa, with p and p basic residues, and an ala (not aliphatic) at p ′, suggesting a somewhat less favourable cleavage by furin. however, in the other less pathogenic circulating human cov, the s ′ cleavage site only exhibits a monobasic r↓s sequence (fig. ) with no basic residues at either p and/or p needed to allow furin cleavage, suggesting a less efficient cleavage or higher restriction at the entry step depending on the cognate proteases expressed by target cells. even though processing at s ′ in -ncov is expected to be a key event for the final activation of the s-protein, the protease(s) involved in this process have not yet been conclusively identified. based on the -ncov s ′ sequence and the above arguments, we propose that one or more furin-like enzymes would cleave the s ′ site at kr↓sf. in contrast to the s ′, the first cleavage between the rbd and the fp (s /s cleavage site, fig. ) has been extensively studied for many covs (lu et al., ) . interestingly the s /s processing site exhibits different motifs among coronaviruses (fig. , site & site ), with many of them displaying cleavage after a basic residue. it is thus likely that the priming process is ensured by different host cell proteases depending on the sequence of the s /s cleavage site. accordingly the mers-cov sprotein, which contains a rsvr↓sv motif is cleaved during virus egress, probably by furin (mille and whittaker, ) . conversely the sprotein of sars-cov remains largely uncleaved after biosynthesis, possibly due to the lack of a favourable furin-like cleavage site (sllr-st). in this case, it was reported that following receptor binding the sprotein is cleaved at a conserved sequence ayt↓m (located aa downstream of sllr-st) by target cells' proteases such as elastase, cathepsin l or tmprss (bosch et al., ; matsuyama et al., matsuyama et al., , millet and whittaker, ) . as the priming event is essential for virus entry, the efficacy and extent of this activation step by the proteases of the target cells should regulate cellular tropism and viral pathogenesis. in the case of the -ncov s-protein, the conserved site sequence ayt↓m may still be cleaved, possibly after the preferred furincleavage at the site (fig. ) . since furin is highly expressed in lungs, an enveloped virus that infects the respiratory tract may successfully exploit this convertase to activate its surface glycoprotein (bassi et al., ; mbikay et al., ) . before the emergence of the -ncov, this important feature was not observed in the lineage b of betacoronaviruses. however, it is shared by other cov (hcov-oc , mers-cov, mhv-a ) harbouring furin-like cleavage sites in their s-protein ( fig. ; table ), which were shown to be processed by furin experimentally (le coupanec et al., . the sp, s ↓s and s ′ cleavage sites are indicated by arrows. the sequence of different cov s /s and s ′ cleavage sites were aligned using multalin webserver (http://multalin.toulouse.inra.fr/multalin/) with manual adjustments and the figure prepared using espript (http://espript.ibcp.fr/espript/espript/) presenting the secondary structure of sars-cov s-protein at the bottom of the alignment (pdb x ) (yuan et al., ) . insertion of furin like cleavage site is surrounded by a black frame. red asterisks indicate the presence of a canonical furin-like cleavage motif at the s /s site. (for interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) ; mille and whittaker, ) . strikingly, the -ncov s-protein sequence contains additional nucleotides upstream of the single arg↓ cleavage site (figs. b and ) leading to a predictively solventexposed prrar↓sv sequence, which corresponds to a canonical furinlike cleavage site (braun and sauter, ; izaguirre, ; seidah and prat, ) . this furin-like cleavage site, is supposed to be cleaved during virus egress (mille and whittaker, ) for s-protein "priming" and may provide a gain-of-function to the -ncov for efficient spreading in the human population compared to other lineage b betacoronaviruses. this possibly illustrates a convergent evolution pathway between unrelated covs. interestingly, if this site is not processed, the s-protein is expected to be cleaved at site during virus endocytosis, as observed for the sars-cov. obviously much more work is needed to demonstrate experimentally our assertion, but the inhibition of such processing enzyme(s) may represent a potential antiviral strategy. indeed, it was recently shown that in an effort to limit viral infections, host cells that are infected by a number of viruses provoke an interferon response to inhibit the enzymatic activity of furin-like enzymes. it was also demonstrated that hiv infection induces the expression of either the protease activated receptor (par ) (kim et al., ) or guanylate binding proteins and (gbp , ) (braun and sauter, ) that restrict the trafficking of furin to the trans golgi network (par ) or to early golgi compartments (gbp , ) where the proprotein convertase remains inactive. altogether, these observations suggest that inhibitors of furin-like enzymes may contribute to inhibiting virus propagation. a variety of approaches have been proposed to inhibit furin activity to limit tumour growth, viral and bacterial infection. thus, a variant of the naturally occurring serine protease inhibitor α- antitrypsin harbouring a consensus furin cleavage, called α- antitrypsin portland (α -pdx), inhibits furin and prevents the processing of hiv- env (anderson et al., ) . the addition of a chloromethylketone (cmk) moiety to the c-terminus of a polybasic cleavage motif and a decanoyl group at the n-terminus to favour cell penetration (dec-rvkr-cmk) irreversibly blocked the enzymatic activity of furin, pc , pc , pace and pc (decroly et al., ; garten et al., ) . finally, the elucidation of the crystal structure of furin resulted in the design of a , dideoxystreptamine-derived inhibitor, where two molecules of the inhibitor form a complex with furin (dahms et al., ) . as furin-like enzymes are involved in a multitude of cellular processes, one important issue would be to avoid systemic inhibition that may result in some toxicity. accordingly, it is likely that such small molecule inhibitors, or other more potent orally active ones, possibly delivered by inhalation and exhibiting a slow dissociation rate from furin to allow for sustained inhibition, deserve to be rapidly tested to assess their antiviral effect against -ncov. inhibition of hiv- gp -dependent membrane fusion by a furin-directed α -antitrypsin variant targeting proprotein convertases in furin-rich lung cancer cells results in decreased in vitro and in vivo growth cathepsin l functionally cleaves the severe acute respiratory syndrome coronavirus class i fusion protein upstream of rather than adjacent to the fusion peptide furin-mediated protein processing in infectious diseases and cancer spike protein, s, of human coronavirus hku : role in viral life cycle and application in antibody detection genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan the s subunit of qx-type infectious bronchitis coronavirus spike protein is an essential determinant of neurotropism human influenza a h n virus related to a highly pathogenic avian influenza virus structural studies revealed active site distortions of human furin by a small molecule inhibitor sars and mers: recent insights into emerging coronaviruses identification of the paired basic convertases implicated in hiv gp processing based on in vitro assays and expression in cd + cell lines processing of viral glycoproteins by the subtilisin-like endoprotease furin and its inhibition by specific peptidylchloroalkylketones isolation and characterization of a bat sars-like coronavirus that uses the ace receptor the proteolytic regulation of virus cell entry by furin and other proprotein convertases role of host cellular proteases in the pathogenesis of influenza and influenza-induced multiple organ failure neuroinflammation-induced interactions between protease-activated receptor and proprotein convertases in hiv-associated neurocognitive disorder angiotensin-converting enzyme : a functional receptor for sars coronavirus cleavage of a neuroinvasive human respiratory virus spike glycoprotein by proprotein convertases modulates neurovirulence and virus spread within the central nervous system structure of sars coronavirus spike receptor-binding domain complexed with receptor early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia angiotensin-converting enzyme is a functional receptor for the sars coronavirus bat-to-human: spike features determining "host jump" of coronaviruses sars-cov, mers-cov, and beyond efficient activation of the severe acute respiratory syndrome coronavirus spike protein by the transmembrane protease tmprss mediated enhancement of severe acute respiratory syndrome coronavirus infection comparative analysis of expression of the proprotein convertases furin, pace , pc and pc in human lung tumours host cell entry of middle east respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein host cell proteases: critical determinants of coronavirus tropism and pathogenesis host cell entry of middle east respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein maturation of hiv envelope glycoprotein precursors by cellular endoproteases on the active site of proteases. . mapping the active site of papain; specific peptide inhibitors of papain proprotein and prohormone convertases: a family of subtilases generating diverse bioactive polypeptides the biology and therapeutic targeting of the proprotein convertases modifications to the hemagglutinin cleavage site control the virulence of a neurotropic h n influenza virus receptor recognition by novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains preliminary estimation of the basic reproduction number of novel coronavirus ( -ncov) in china, from to : a data-driven analysis in the early phase of the outbreak supplementary data to this article can be found online at https:// doi.org/ . /j.antiviral. . . key: cord- -x uxdi authors: daniel, dennis a.; poynter, sue e.; landrigan, christopher p.; czeisler, charles a.; burns, jeffrey p.; wolbrink, traci a. title: pediatric resident engagement with an online critical care curriculum during the intensive care rotation* date: - - journal: pediatr crit care med doi: . /pcc. sha: doc_id: cord_uid: x uxdi residents are often assigned online learning materials as part of blended learning models, superimposed on other patient care and learning demands. data that describe the time patterns of when residents interact with online learning materials during the icu rotation are lacking. we describe resident engagement with assigned online curricula related to time of day and icu clinical schedules, using website activity data. design: prospective cohort study examining curriculum completion data and cross-referencing timestamps for pre- and posttest attempts with resident schedules to determine the hours that they accessed the curriculum and whether or not they were scheduled for clinical duty. residents at each site were cohorted based on two differing clinical schedules—extended duration (> hr) versus shorter (maximum hr) shifts. setting: two large academic children’s hospitals. subjects: pediatric residents rotating in the picu from july to june . interventions: none. measurements and main results: one-hundred and fifty-seven pediatric residents participated in the study. the majority of residents ( / ; %) completed the curriculum, with no statistically significant association between overall curriculum completion and schedule cohort at either site. residents made more test attempts at nighttime between pm and am ( , / , ; %) regardless of whether they were scheduled for clinical duty. approximately two thirds of test attempts ( , / , ; %) occurred when residents were not scheduled to work, regardless of time of day. forty-two percent of all test attempts ( , / , ) occurred between pm and am while off-duty, with % ( / , ) occurring between midnight and am. conclusions: residents rotating in the icu completed online learning materials mainly during nighttime and off-duty hours, including usage between midnight and am while off-duty. increasing nighttime and off-duty workload may have implications for educational design and trainee wellness, particularly during busy, acute clinical rotations, and warrants further examination. i n busy clinical rotations such as the icu, patients, diagnoses, and clinical acuity vary between rotations, and limited time and competing demands are common for both trainees and faculty ( ) . supplementing the icu rotation experience with a blended online educational curriculum may help improve knowledge and ensure consistent exposure to core content ( ) , but these resources are often superimposed on the demands of patient care and other educational experiences ( ) . for our icu residents, we designed curricula that included short videos with pre-and posttests and hypothesized that residents would use these materials most frequently during breaks in patient care while on clinical duty. however, there is a lack of previously published data that describe when and to what extent resident physicians complete online materials during icu rotations. such understanding would help inform decisions about how to best implement educational interventions for residents rotating in the demanding, high-acuity clinical setting of the icu. we conducted a prospective cohort study examining junior (post-graduate year ) resident use of online curricula during their first rotation in medical-surgical picus at two large academic children's hospitals (boston children's hospital and cincinnati children's hospital) that are similar in size, patient population, and resident scope of responsibility. this study was approved by the institutional review boards at both sites. icu resident rotation directors created individualized curricula for each site that covered core concepts in pediatric critical care medicine. both sites also provided in-person educational experiences (bedside teaching, didactic lectures, and manikin-based simulations). site delivered in-person education between : and : every weekday, and site did so between : and : every monday through thursday. we administered the online curricula from july to june at site and june to june at site . neither site provided protected time for curriculum completion. although residents were informed of the expectation to complete curricula by the end of their icu rotation, there were no formal consequences for failing to complete. at both sites, residents were e-mailed weeks before their rotation and instructed to complete the curriculum before the end of their icu rotation, with e-mail reminders provided at rotation weeks , , and . residents were excluded from the time-of-use analysis if their daily schedule data were unavailable. curricula were delivered on openpediatrics (www.openpediatrics.org), which is based at site . each lesson contained a pretest, video, and posttest. the curriculum contained lessons at site and lessons at site . individual lesson videos ranged in length from to minutes (average length min). the total curriculum video duration at site was hours, minutes and at site was hours, minutes. the website required strictly linear progress; a pretest, then video, then posttest for each lesson needed to be completed before a resident could progress to the next lesson. residents were only able to take the pretest once, but posttests could be attempted multiple times until the minimum passing score (≥ %) was achieved. the platform allowed residents to stop and restart within preor posttests, as well as within videos, if they did not complete a given item in one sitting. during the study interval, residents worked within two different clinical schedules as part of a concurrent trial of resident physician work hours randomized order safety trial evaluating resident schedules ( ), where each schedule operated for year of clinical rotations at each site. one schedule cohort involved daytime and nighttime work shifts limited to a maximum of hours of duration, whereas the second involved traditional extended duration (> hr) work shifts, with daytime shifts alternating with extended duration work shifts every fourth night. on average, residents worked about % more hours per week on the extended duration work schedule ( ) . throughout this article, we refer to these differing schedule cohorts as "short call" and "long call". we collected curriculum completion data for each resident and timestamps for every pre-and the first posttest attempt that occurred during the icu rotation and in the days preceding. we did not include test attempts occurring prior to the icu rotation in the time-of-use analysis due to the significant heterogeneity in resident clinical rotations immediately prior to the icu rotation. we only included the first posttest attempt to avoid over-representing a given time of day if a user attempted a posttest multiple times. video viewing activity is captured only in aggregate, deidentified fashion on the platform, so individual video view timestamps were not available for specific residents. for test attempts during the icu rotation, we cross-referenced timestamps with each resident's schedule to determine whether or not they were scheduled for clinical service in the hospital. we calculated frequencies and percentages for resident and site characteristics and compared data between cohorts and between sites using chi-square tests of independence using a significance level of . . data were analyzed using stata/se . (statacorp, college station, tx) and microsoft excel (microsoft corp., washington, dc). during the study, residents rotated through the icu for the first time, and % ( / ) accessed the curriculum. fifty-three percent of residents ( / ) accessed the curriculum during the icu rotation at least once while on duty. seven residents at site accessed the curriculum during the rotation but did not have daily schedule data available for analysis. we included residents ( / ; %) in the time-of-use analysis. there were no statistically significant differences in demographic characteristics (gender or residency track) between the two sites or between the schedule cohorts within each site. sixty-eight percent of residents ( / ) completed the curriculum ( table ) . a greater percentage of residents completed the curriculum at site ( / ; %) compared with site ( / ; %), p value of less than . . there was no statistically significant association between overall curriculum completion and schedule cohort at either site. we included , test attempts by residents from the two sites in the time-of-use analysis. of the test attempts made during clinical duty, % ( / , ) occurred during daytime shifts versus % ( / , ) at night. approximately two thirds of test attempts ( , / , ; %) occurred when residents were not scheduled to work, regardless of time of day. approximately two thirds of all test attempts ( , / , ; %) occurred during nighttime hours (between pm and am), regardless of work status (fig. a) of all test attempts occurring between midnight and am while residents were not scheduled to work. we observed an association between time-of-use patterns and schedule cohort at site but not at site . at site , residents in the long call cohort used the curriculum more during nighttime hours ( / ; %) compared with residents in the short call cohort ( / ; %), p value of less than . . the long call cohort also used the curriculum more during times when they were not scheduled to work ( / ; %) compared with the short call cohort ( / ; %), p value of less than . . figure b displays the distribution of test attempts by day of rotation, ranging from days prior to the start of the rotation to days after the start date. for test attempts within the icu rotation, a greater proportion occurred in the second half of the rotation ( , / , ; %) versus the first half ( , / , ; %). three hundred and forty-eight additional test attempts occurred prior to the start of the rotation, the majority of which ( / ; %) were in the days immediately prior. using years of timestamped online learning data from two large pediatric residency programs, we found that the majority of residents in our study accessed a supplemental online curriculum during the icu rotation. however, despite designing the icu curriculum to include short lessons that could be completed during breaks from clinical work while on duty, only half of the included residents accessed the curriculum during clinical periods, and they frequently chose to use the materials during nighttime hours and when not scheduled to work clinically. notably, % of test attempts occurred between midnight and am while residents were not scheduled to work. test attempts were made immediately prior to, and throughout the rotation, with a majority occurring in the second half of the rotation. the inconsistent association of completion rate or time of use with schedule cohort makes it less likely that the type of schedule is a main contributor to online curriculum engagement. because both sites provided in-person educational experiences during most workdays in addition to the online curriculum, it is possible that residents were biased against completing online materials while on-duty, since other materials were already being presented during on-duty hours. other studies have supported the notion that medical learners often prefer online learning to be supplementary to in-person learning experiences ( , ) . the greater proportion of on-duty use at night compared with during the daytime may be a consequence of the greater volume of clinical care demands requiring resident attention during the day, including but not limited to rounds and scheduled admissions. the greater number of test attempts in the second half of the rotation may reflect residents catching up on incomplete lessons before the end of the rotation or may be related to residents focusing on getting comfortable in the clinical environment before turning their attention to self-directed learning. several studies have highlighted successful implementation and outcomes of online medical learning, noting improvements in knowledge ( ) and perceived utility and satisfaction from clinicians and instructors ( ) . however, although residents have always incorporated self-directed learning at night and during off hours, blended learning models that increase trainees' obligatory nonclinical workload outside of dedicated educational time may have a different impact. despite our intention to provide short lessons that would provide education during breaks from patient care, residents accessed the curriculum more often during nonclinical hours. previous reports have commented on the need to consider the distinct time constraints and serviceeducation task conflicts in graduate medical education ( ) and on the risk of creating information overload when educational content is shifted to the online environment ( ) . increases in workload added to preexisting stressors of the clinical learning environment can contribute to resident physician burnout ( , ) and sleep deprivation. sleep deficiency is known to adversely impact resident clinical performance ( ) and increases risk of physical harm, such as motor vehicle crashes ( ) and needlestick/sharps injuries ( ) . therefore, program and rotation directors of busy, inpatient rotations may need to consider alternative approaches to implementing online learning, such as incorporating protected time to complete curricula; making the curriculum optional or controlling the volume of content shifted to the asynchronous, self-directed setting ( , , ) . our study has several important limitations. as this was a purely an observational study, we did not qualitatively assess resident motivations for why they accessed the curricula at the times they did nor did we formally survey participant satisfaction with the curriculum or any potential impact on their wellness. these are important areas that warrant further investigation. additionally, not all residents completed the online curricula, and rates of curriculum completion differed between the two sites despite having identical procedures to encourage completion. this may reflect differences in willingness to engage with online curricula during clinical rotations between residents and between the two sites, which may have led to a sampling bias. interestingly, despite the fact that openpediatrics (https:// www.openpediatrics.org/) is primarily based at site , curriculum completion rate was lower at that site, suggesting that there was not increased pressure among residents to use the platform due to the shared institutional affiliation. although we have no reason to suspect differences in resident roles or workload between the sites, we did not directly assess this either. curricular length has also been described as a barrier to curriculum completion ( , ) ; however, we observed higher completion rates by residents at the site with the longer curriculum. our data show that residents will engage with online learning materials during and immediately prior to their icu rotation but do so most often at nighttime and when off-duty, with a portion of use occurring during midnight and am while offduty. this may have implications for resident well-being, including sleep, personal life disruptions, and/or burnout, and warrants further examination. program and rotation directors will need further guidance on how best to implement blended learning models in busy clinical rotations, such as the icu. we wish to thank the residents at boston children's hospital and cincinnati children's hospital for their participation in this study, as well as the administrative staff at both hospitals and at openpediatrics who provided coordination and support for this study. drs. poynter's, landrigan's, and czeisler's institutions received funding from the national heart, lung, and blood institute (nhlbi), and they received support for article research from the national institutes of health. dr. landrigan received funding from midwest hospital association/executive speakers bureau and midwest lighting institute; he reports receiving grants from patientcentered outcomes research institute, consulting fees, and equity from the i-pass patient safety institute, and consulting fees from virgin pulse; and he has received monetary awards, honoraria, and travel reimbursement from multiple academic and professional organizations for teaching and consulting on sleep deprivation, physician performance, handoffs, and safety and has served as an expert witness in cases regarding patient safety and sleep deprivation. drs. landrigan and czeisler report being principal investigators of the randomized order safety trial evaluating resident-physician schedules, which is supported by grants (u -hl- and u -hl- ) from the nhlbi. dr. czeisler serves as the incumbent of a harvard medical school professorship that was endowed in by cephalon, inc., which has been since been acquired by teva pharmaceutical industries ltd., and he is supported in part by funding from the national institute of occupational safety and health r -oh- . dr from ganésco and zurich insurance, and fees for serving as a member of an advisory board from the institute of digital media and child development and the klarman family foundation, holding a number of process patents in the field of sleep and circadian rhythms (e.g., photic resetting of the human circadian pacemaker) and an equity interest in vanda pharmaceuticals, being the incumbent of an endowed professorship provided to harvard university by cephalon, receiving fees for serving as an expert on various legal and technical cases related to sleep or circadian rhythms from casper sleep, comair/delta airlines, complete general construction, fedex, greyhound, hg energy, purdue pharma, south carolina central railroad, steel warehouse, stric-lan, texas premier resources, and united parcel service, and receiving royalties from the new england journal of medicine, mcgraw-hill, houghton mifflin harcourt/penguin, and from philips respironics for the actiwatch and actiwatch spectrum devices. dr. czeisler's interests were reviewed and managed by brigham and women's hospital and partners healthcare in accordance with their conflict of interest policies. the remaining authors have disclosed that they do not have any potential conflicts of interest. this study was approved by the institutional review boards at both participating sites prior to data collection and analysis. for information regarding this article, e-mail: dennis.daniel@childrens.harvard.edu balancing service and education in residency training: a logical fallacy flipped classrooms in graduate medical education: a national survey of residency program directors a systematic review of the effectiveness of flipped classrooms in medical education rosters study group: design and recruitment of the randomized order safety trial evaluating resident-physician schedules (rosters) study effects on resident work hours, sleep duration, and work experience in a randomized order safety trial evaluating resident-physician schedules (rosters) blended learning: how can we optimise undergraduate student engagement the effectiveness of online and blended learning: a meta-analysis of the empirical literature internet-based learning in the health professions: a meta-analysis the impact of e-learning in medical education flipping out: does the flipped classroom learning model work for gme? twelve tips for "flipping" the classroom a narrative review on burnout experienced by medical students and residents effect of reducing interns' work hours on serious medical errors in intensive care units extended work shifts and the risk of motor vehicle crashes among interns resident wellness matters: optimizing resident education and wellness through the learning environment effects of health care provider work hours and sleep deprivation on safety and performance advances in medical education and practice: student perceptions of the flipped classroom impact of required versus self-directed use of virtual patient cases on clerkship performance: a mixedmethods study efficacy of an asynchronous electronic curriculum in emergency medicine education in the united states massive open online course completion rates revisited: assessment, length and attrition key: cord- -jf mj z authors: yang, ziheng; bielawski, joseph p. title: statistical methods for detecting molecular adaptation date: - - journal: trends ecol evol doi: . /s - ( ) - sha: doc_id: cord_uid: jf mj z the past few years have seen the development of powerful statistical methods for detecting adaptive molecular evolution. these methods compare synonymous and nonsynonymous substitution rates in protein-coding genes, and regard a nonsynonymous rate elevated above the synonymous rate as evidence for darwinian selection. numerous cases of molecular adaptation are being identified in various systems from viruses to humans. although previous analyses averaging rates over sites and time have little power, recent methods designed to detect positive selection at individual sites and lineages have been successful. here, we summarize recent statistical methods for detecting molecular adaptation, and discuss their limitations and possible improvements. 'i t has been proved remarkably difficult to get compelling evidence for changes in enzymes brought about by selection, not to speak of adaptive changes' . although darwin's theory of evolution by natural selection is generally accepted by biologists for morphological traits (including behavioural and physiological), the importance of natural selection in molecular evolution has long been a matter of debate. the neutral theory maintains that most observed molecular variation -both polymorphism within species and divergence between species -is due to random fixation of selectively neutral mutations. well established cases of molecular adaptation have been rare . several tests of neutrality have been developed and applied to real data, and although they are powerful enough to reject strict neutrality in many genes, they rarely provide unequivocal evidence for positive darwinian selection. most convincing cases of adaptive molecular evolution have been identified through comparison of synonymous (silent; d s ) and nonsynonymous (amino acid-changing; d n ) substitution rates in protein-coding dna sequences, thus providing fascinating case studies of natural selection in action on the protein molecule. selected examples are listed in table ; see hughes for detailed descriptions of many case studies. here, we summarize recent methodological developments that improve the power to detect adaptive molecular evolution, and examine their strengths and weaknesses , so that they can be used to detect more cases of molecular adaptation. traditionally, synonymous and nonsynonymous substitution rates (box ) are defined in the context of comparing two dna sequences, with d s and d n as the numbers of synonymous and nonsynonymous substitutions per site, respectively . thus, the ratio ϭ d n /d s measures the difference between the two rates and is most easily understood from a mathematical description of a codon substitution model (box ). if an amino acid change is neutral, it will be fixed at the same rate as a synonymous mutation, with ϭ . if the amino acid change is deleterious, purifying selection (box ) will reduce its fixation rate, thus Ͻ . only when the amino acid change offers a selective advantage is it fixed at a higher rate than a synonymous mutation, with Ͼ . therefore, an ratio significantly higher than one is convincing evidence for diversifying selection. the codon-based analysis (box ) cannot infer whether synonymous substitutions are driven by mutation or selection, but it does not assume that synonymous substitutions are neutral. for example, highly biased codon usage can be caused by both mutational bias and selection (e.g. for translational efficiency ), and can greatly affect synonymous substitution rates. however, by employing parameters j for the frequency of codon j in the model (box ), estimation of ziheng yang and joseph p. bielawski the past few years have seen the development of powerful statistical methods for detecting adaptive molecular evolution. these methods compare synonymous and nonsynonymous substitution rates in protein-coding genes, and regard a nonsynonymous rate elevated above the synonymous rate as evidence for darwinian selection. numerous cases of molecular adaptation are being identified in various systems from viruses to humans. although previous analyses averaging rates over sites and time have little power, recent methods designed to detect positive selection at individual sites and lineages have been successful. here, we summarize recent statistical methods for detecting molecular adaptation, and discuss their limitations and possible improvements. substitution rates will fully account for codon-usage bias (box ), irrespective of its source. because parameter is a measure of selective pressure on a protein, it differentiates codon-based analyses from the more general tests of neutrality proposed in population genetics , . these general tests often lack the power to determine the sources of the departure from the strict neutral model, such as changes in population size, fluctuating environment or different forms of selection. two classes of methods have been suggested to estimate d n and d s between two protein-coding dna sequences. the first class includes over a dozen intuitive methods developed since the early s (refs , - ). these methods involve the following steps: counting synonymous (s) and nonsynonymous (n) sites in the two sequences, counting synonymous and nonsynonymous differences between the two sequences, and correcting for multiple substitutions at the same site. s and n are defined as the sequence length multiplied by the proportions of synonymous and nonsynonymous changes before selection on the protein , . most of these methods make simplistic assumptions about the nucleotide substitution process and also involve ad hoc treatment of the data that cannot be justified , ; therefore, we refer to these methods of estimating d n and d s as approximate methods. the methods of miyata and yasunaga , and nei and gojobori , assume an equal rate for transitions (t ↔ c and a ↔ g) and transversions (t,c ↔ a,g), as well as a uniform codon usage. because transitions at the third 'wobble' position are more likely to be synonymous than transversions, ignoring the transition/ transversion rate ratio leads to underestimation of s and overestimation of n (ref. ). efforts have been taken to incorporate the transition/transversion rate bias (box ) when counting sites and differences - . the effect of prior probability: the probability of an event (such as a site belonging to a site class) before the collection of data. positive selection: darwinian selection fixing advantageous mutations with positive selective coefficients. the term is used interchangeably with molecular adaptation and adaptive molecular evolution. posterior probability: the probability of an event conditional on the observed data, which reflects both the prior assumption and information in the data. purifying selection: natural selection against deleterious mutations with negative selective coefficients. the term is used interchangeably with negative selection or selective constraints. synonymous substitution: a nucleotide substitution that does not change the encoded amino acid. transition/transversion rate bias: unequal substitution rates between nucleotides, with a higher rate for transitions (changes between t and c and between a and g) than transversions (all other changes). biased codon usage has largely been ignored ; however, extreme codon-usage bias can have devastating effects on the estimation of d n and d s (see the next section) , . a recent ad hoc method incorporates both transition and codon-usage biases. the second class is the maximum likelihood (ml) method based on explicit models of codon substitution (box ) , . parameters in the model (i.e. sequence divergence t, transition/transversion rate ratio and the d n /d s ratio ) are estimated from the data by ml, and are used to calculate d n and d s according to their definitions , , . a major feature of the method is that the model is formulated at the level of instantaneous rates (where there is no possibility for multiple changes) and that probability theory accomplishes all difficult tasks in one step: estimating mutational parameters, such as ; correcting for multiple hits; and weighting pathways of change between codons. statistical tests can be used to test whether d n is significantly higher than d s . for approximate methods, a normal approximation is applied to d n Ϫ d s . for ml, a likelihoodratio test can be used. in this case, the null model has fixed at , whereas the alternative model estimates as a free parameter. twice the log-likelihood difference between the two models is compared with a distribution with one degree of freedom to test whether is different from one. computer simulation has been used to examine the performance of different estimation methods; the findings are consistent with observations made in real data analyses , , . we demonstrate the effects of different estimation procedures using human and orangutan ␣ -globin genes ( table ) . for comparison, different assumptions are made in ml concerning the transition/transversion rate bias and the codon-usage bias. the simpler models are each rejected when compared with more complex models by likelihood-ratio tests, confirming biased transition rates and codon usage. thus, estimates from ml accounting for both biases (model , table ) are expected to be the most reliable. we make the following observations: • assumptions appear to matter more than methods. the approximate methods and ml produce similar results under similar assumptions. the method of nei and gojobori is similar to ml under a model that ignores both transition/transversion bias and codon-usage bias (model , table ), whereas the methods of ina and li are similar to ml under a model accounting for the transition/transversion bias but ignoring codon-usage bias (model , table ). the method of yang and nielsen is similar to ml under a model accounting for both biases (model , table ). however, for distantly related sequences, ad hoc treatment in approximate methods can lead to serious biases even under the correct assumptions . • ignoring the transition/transversion rate bias leads to underestimation of s, overestimation of d s and underestimation of the ratio . • codon-usage bias in these data has the opposite the codon is considered the unit of evolution. the substitution rate from codons i to j (i j) is given as: parameter is the transition/transversion rate ratio, j is the equilibrium frequency of codon j and (ϭ d n /d s ) measures the selective pressure on the protein. the q ij are relative rates because time and rate are confounded in such an analysis. given the rate matrix q ϭ {q ij }, the transition probability matrix over time t is calculated as: where p ij (t) is the probability that codon i becomes codon j after time t. likelihood calculation on a phylogeny involves summing over all possible codons in extinct ancestors (internal nodes of the tree effect to the transition/transversion bias; ignoring codon-usage bias leads to overestimation of s, underestimation of d s and overestimation of . this gene is extremely gc-rich at the third codon position, with base frequencies at % (t), % (c), % (a) and % (g). most changes at the third position (before selection at the amino acid level) are transversions between c and g. thus, the number of synonymous sites is less than half that expected under equal base and codon frequencies. although, in theory, the bias caused by unequal codon frequencies can be in the opposite direction , we have not encountered a real gene showing that pattern. such codon-usage bias appears to have misled previous analyses examining the relationship between the gc content at silent sites and d s , because those studies ignored the codon-usage bias when estimating d s (ref. ). • different methods can produce different estimates, even when the sequences are highly similar. the sequences used in table are only about % different at silent sites and Ͻ % different at nonsynonymous sites; however, estimates of are three times different. because all estimation procedures partition the total numbers of sites and differences into synonymous and nonsynonymous categories, underestimation of one means overestimation of the other, thus resulting in large errors in the ratio. if, for most of the time, a gene evolves under purifying selection but is occasionally subject to episodes of adaptive change , a comparison between two distantly related sequences is unlikely to yield a d n /d s ratio significantly greater than one. methods have been developed to detect positive selection (box ) along specific lineages on a phylogeny. if the gene sequences of the extinct ancestors were known, it would be straightforward to use the pairwise methods discussed above. thus, messier and stewart inferred ancestral lysozyme gene sequences through phylogenetic analysis , , and used them to calculate d n and d s for each branch in the phylogeny. their analysis identified two lineages in a primate phylogeny with highly elevated nonsynonymous substitution rates. the same approach was taken in a test of relaxed selective constraint in the rhodopsin gene of cave-dwelling crayfishes . there are also likelihood models that allow different ratios for branches in a phylogeny , . using such models, likelihood-ratio tests can be constructed to test hypotheses. for example, the ratio for a predefined lineage can be either fixed at one or estimated as a free parameter. the likelihood values under those two models can be compared, to test whether Ͼ in that lineage. similarly, a model assuming a single for all lineages (the one-ratio model) can be compared with another model assuming an independent for each lineage (the free-ratio model), to test the neutral prediction that the ratio is identical among lineages , . it should be noted that variation in the ratio among lineages is a violation of the strictly neutral model , , , , but it is not sufficient evidence for adaptive evolution. in particular, if nonsynonymous mutations are slightly deleterious, they will have a higher probability of fixation in a small population than in a large one , and thus lineages of different population sizes will have different ratios. besides positive selection, relaxed selective constraint can also elevate the ratio -it might be difficult to distinguish the two if the estimated is not larger than one. furthermore, it is incorrect to use the free-ratio model to identify lineages of interest and then to perform further tests on the ratios for those lineages using the same data without any correction . the statistical-estimation theory used in the methods discussed in this review can be explained with the following simple hypothetical example. suppose that a population is an admixture of two groups of people in the proportions % and %, and a certain disease occurs at a rate of % in group i and of . % in group ii. suppose a random sample of individuals is taken from the population, what is the probability that three of them carry the disease? the probability that a random individual carries the disease (d) is an average over the two groups (g and g ): p = p(d) = p(g ) × p(dg ) + p(g ) × p(dg ) = . × . + . × . = . ( ) similarly, the probability that an individual does not carry the disease is: the probability that three out of individuals carry the disease is given by the binomial probability: ( if eqn involves an unknown parameter [such as the rate p(d|g ) in group i], that parameter can be estimated by maximizing eqn . in that case, eqn gives the probability of observing the data (sample) and is called the likelihood function. the second question is to calculate the probability that an individual in the sample who carries the disease is from group i. the bayes theorem gives this probability as: p(g d) = p(g ) × p(dg )/p(d) = . × . / . = . ( ) note that this is just the proportion of the contribution from group i to p(d) in eqn . thus, this individual is most likely to be from group i. similarly, a healthy individual in the sample is more likely to be from group i than from group ii because p(g d -) = p(g ) × p(d -g )/p(d -) = . × . / . = . and p(g d -) = -p(g d -) = . ( ) in methods for inferring sites under positive selection , , we let d in the example be the data at a site and g i be the ith site class with the d n /d s ratio i . the probability of observing data at a site is then an average over the site classes (eqn ). the product of such probabilities over sites constitutes the likelihood (eqn ), from which we estimate any unknown parameters, such as the branch lengths and parameters in the distribution over sites. after the parameters are estimated, we use the bayes theorem to calculate the probability that any site, given data at that site, is from each site class (eqns and ). another straightforward application of the theory is ancestral sequence reconstruction; in this case, we replace g i with a reconstruction (characters at interior nodes of the phylogeny) at a site. when we calculate the likelihood function, the probability of data at a site p(d) is a sum over all possible ancestral reconstructions (g i s) (eqns and ). after parameters are estimated, the reconstruction that makes the greatest contribution to p(d) is the most likely (eqns and ) . the bayes method discussed here is known as the empirical bayes, because it uses estimates of parameters and does not account for their sam-pling errors. this might be a concern if parameters are estimated from small samples or if the posterior probabilities are sensitive to parameter estimates. an alternative approach is the hierarchical bayes method, which accounts for the uncertainty in unknown parameters by averaging over their prior distribution. note that the reconstructed ancestral sequences , as well as the inferred site classes in the site-class models , , are pseudo data and involve systematic biases. to appreciate such biases, note that in the previous example, the bayes calculations (eqns and ) predict that each of the individuals in the sample, healthy or sick, are from group i. although this is the best prediction, the accuracy is low. if such inferred group identities are used for further statistical analysis, misleading results might follow. methods based on ancestral reconstruction might not provide reliable statistical tests because they ignore errors and biases in reconstructed ancestral sequences (box ). the ml method has the advantage of not relying on reconstructed ancestral sequences. it can also easily incorporate features of dna sequence evolution, such as the transition/transversion rate bias and codon-usage bias, and is thus based on a more realistic evolutionary model. when likelihood-ratio tests suggest adaptive evolution along certain lineages, ancestral reconstruction might be useful to pinpoint the involved amino acids and to infer ancestral proteins, which can be synthesized and examined in the laboratory , . the methods discussed so far assume that all amino acid sites are under the same selective pressure, with the same ratio. the analysis effectively averages the ratio across all sites and positive selection is detected only if that average is Ͼ . this appears to be a conservative test of positive selection because many sites might be under strong purifying selection owing to functional constraint, with the ratio close to zero. a few recent studies addressed this problem. fitch and colleagues , used parsimony to reconstruct ancestral dna sequences, and counted changes at each codon site along branches of the tree. they tested whether the proportion of nonsynonymous substitutions at each site is greater than the average over all sites in the sequence. suzuki and gojobori took a more systematic approach. for each site in the sequence, they estimated the numbers of synonymous and nonsynonymous sites and differences along the tree using reconstructed ancestral sequences, and then tested whether the proportion of nonsynonymous substitutions differed from the neutral expectation ( ϭ ). suzuki and gojobori's criterion is more stringent than fitch et al.'s, because the ratio averaged over sites is almost always Ͻ . these methods are expected to require many sequences in the data set so that there are enough changes at individual sites. furthermore, the reliability of significance values produced by these methods might be affected by the use of ancestral reconstruction, which is most unreliable at the positively selected or variable sites , and by codon composition bias, which is most extreme at a single site. in a likelihood model, it is impractical to use one parameter for each site. the standard approach is to use a statistical distribution to describe the variation of among sites; for example, we might assume several classes of sites in the protein with different ratios , . the test of positive selection then involves two major steps: first, to test whether sites exist where Ͼ , which is achieved by a likelihood-ratio test comparing a model that does not allow for such sites with a more general model that does; and second, to use the bayes theorem to identify positively selected sites when they exist. sites having high posterior probabilities (box ) for site classes with Ͼ are potential targets of diversifying selection. the theory is explained in box (refs , , ) . nielsen and yang implemented a likelihood-ratio test based on two simple models. the null model, m (neutral), assumes a class of conserved sites with ϭ and another class of neutral sites with ϭ . the alternative model, m (selection), adds a third class of sites with estimated from the data. (the model codes are those used in the codeml program in the paml package.) if m fits the data significantly better than m and the estimated ratio for the third class in m is Ͼ , then some sites are under diversifying selection. zanotto et al. used this test to identify several sites under strong positive selection in the nef gene of hiv, whereas both pairwise comparison and slidingwindow analysis failed. this comparison was later found to lack power in some genes because m does not account for sites with Ͻ Ͻ and the third class in m is forced to account for such sites . thus, yang et al. implemented several new models. for example, the beta distribution (m beta) is a flexible null model with Ͻ Ͻ , and can be compared with an alternative that adds an additional site class with estimated (m beta&). a general discrete model (m ) was also implemented . these models identified positive selection in six out of ten genes the authors analysed. figure shows the use of a discrete model (m ) with three classes to identify sites under diversifying selection in abalone sperm lysin . the methods discussed above assume that there are heterogeneous classes of amino acid sites but that we do not know a priori which class each site is from. such 'fishing-expedition' studies might be useful in generating hypotheses for laboratory investigation because they could identify crucial amino acids whose changes have offered a selective advantage in nature's evolutionary experiment. for example, amino acid residues under diversifying selection were inferred in analyses of hiv- nef (ref. ) and env (ref. ) genes, which might constitute unidentified viral epitopes. alternatively, we might wish to test an a priori hypothesis that certain structural and functional domains of the protein are under positive selection. in such cases, likelihood models can be constructed that assign and estimate different parameters for sites from different structural and functional domains . all the methods for detecting positive selection reviewed here appear to be conservative. they detect selection only if d n is higher than d s -selection that does not cause excessive nonsynonymous substitutions, such as balancing selection, might not be detected. the pairwise comparison has little power because it averages the ratio over sites and over time. methods for detecting selection along lineages work only if the ratio averaged over all sites is Ͼ . similarly, the test of positive selection at sites works only if the ratio averaged over all branches is Ͼ . if adaptive evolution occurs only in a short time interval and affects only a few crucial amino acids, none of the methods is likely to succeed. constancy of selective pressure at sites appears to be a much more serious assumption than constancy among lineages, especially for genes likely to be under continuous selective pressure, such as the hiv env gene. indeed, models of variable selective pressures among sites , have been successful in detecting positive selection, even in a background of overwhelming purifying selection indicated by an average ratio much smaller than one , , , . models that allow to vary among both lineages and sites should have increased power. the methods discussed here also assume the same ratio for all possible amino acid changes; for example, at a positively selected site, all amino acid changes are assumed to be advantageous, which is unrealistic. although amino acid substitution rates are known to correlate with their chemical properties, the relationship is poorly understood , . it is also not entirely clear how to define positive selection in a model accounting for chemical properties. it will be interesting to perform computer simulations to examine the power of various detection methods and to investigate how this is affected by important factors, such as the size of the gene, sampling of species (sequences) and the level of sequence divergence. including more sequences in the data should improve the power of site-based analyses. sequence divergence is also important because neither very similar nor very divergent sequences contain much information. very divergent sequences might also be associated with problems with alignment and unequal nucleotide compositions in different species. analyses discussed here, which require information from both synonymous and nonsynonymous substitutions, are expected to have a narrower window of suitable sequence divergences than phylogeny reconstruction. the large-sample approximation to the likelihood-ratio test statistic might also be examined, but limited simulations suggest that typical sequence data (with Ͼ codons) are large enough for it to be reliable. for very short genes or gene regions and especially at low sequence divergences, monte carlo simulation might be needed to derive the null distribution. the likelihood analysis assumes no recombination within a gene. if recombination occurs, different regions will have different phylogenies. empirical data analysis suggests that the phylogeny does not have much impact on tests of positive selection and identification of sites, and one might suspect that recombination will not cause false positives by the likelihood-ratio test. however, simulation studies are necessary to understand whether this is the case. t he evolutionary origin of snakes (or serpentes) has been discussed for over years and their phylogenetic position within squamates is still debated. around snake species are alive today and these are divided into three main groups - (box ): tiny fossorial (burrowing) scolecophidians (blindsnakes); anilioids (pipesnakes), which are mostly semi-fossorial; and macrostomatans, which include more familiar taxa, such as boas, pythons, vipers and cobras. in addition to the more obvious diagnostic characters of body elongation, limblessness and jaws that can engulf surprisingly large prey, other key features of snakes include absence of eyelids and external ears, and the presence of deeply forked tongues (linked to their highly attuned and sophisticated chemosensory systems ). hypotheses concerning snake interrelationships fall into two main groups. for some researchers, snakes descend from terrestrial squamates that developed fossorial (burrowing) habits. two groups of lizards exhibiting such habitats, amphisbaenians and dibamids, have often been regarded as snakes' closest living relatives . amphisbaenians, in particular, resemble scaly, loose-skinned earthworms, whose shovel-shaped or wedge-like heads function as soil-shunting devices. specializations shared by snakes (fig. a) , amphisbaenians (fig. b) and dibamids include loss, reduction and consolidation of skull bones; braincase enclosure; dorsal displacement of jaw-closing muscles; loss or reduction of limbs and girdles; and increased uniformity along the vertebral column. furthermore, differences between the eyes of lizards and snakes are consistent with a model in which structures that were barely useful in a burrower underwent progressive reduction. thus, whereas lizards, like humans, distort eye lens curvature to focus on objects, snakes lack ciliary muscles and are compelled to move the entire lens back and forth relative to the retina. moreover, unlike lizards, snakes lack both a fovea and coloured oil droplets in retinal cells . alternative hypotheses postulate that snakes are related to mosasauroids (fig. c) : spectacular marine reptiles from the upper half of the cretaceous period, some - mya . mosasauroids and snakes share reduced ossification of the pelvis and hindlimbs as well as specialized features of the jaw suspension and intramandibular joint kinetics (presence of a hinge allowing a degree of lateral movement within the lower jaw; fig. a,c,d; fig. , red circle) . phylogenetically, mosasauroids would be the nearest monophyletic sister group of snakes, with varanoid lizards (monitors) as the immediate sister group to this pair. given this theory of relationships, the latest common ancestor of mosasaurs and snakes has been argued to have been a limbed, aquatic or semiaquatic squamate , - . note that the implied ecological shift from an aquatic to a terrestrial environment in snake ancestry suggests that mosasaurs' (implied) aquatic habits were also primitive for serpentes. subsequently, snakes reduced and lost their limbs, although rudiments of the posterior pair remain in some forms, such as pythons. renewed interest in the origin of snakes has been triggered by the recognition and discovery of three remarkable fossil forms with hind legs. each of these ancient snakes is around my old and originates from lowermost upper cretaceous sediments in the middle east. pachyrhachis problematicus, from israel ( fig. d-f ), rapidly assumed a central position in debates about snake phylogeny , , . it has miniature hindlimbs articulated with a rudimentary pelvic girdle (fig. e,f) , but sadly, its feet are missing. currently described from only two specimens, it nice snake, shame about the legs snakes are one of the most extraordinary groups of terrestrial vertebrates, with numerous specializations distinguishing them from other squamates (lizards and their allies). their musculoskeletal system allows creeping, burrowing, swimming and even gliding, and their predatory habits are aided by chemo-and thermoreceptors, an extraordinary degree of cranial kinesis and, sometimes, powerful venoms. recent discoveries of indisputable early fossil snakes with posterior legs are generating intense debate about the evolutionary origin of these reptiles. new cladistic analyses dispute the precise significance and phylogenetic placement of these fossils. these conflicting hypotheses imply radically different scenarios of snake origins and relationships with wide biological implications. rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes positive selection for colicin diversity in bacteria coordinated amino acid changes in the evolution of mammalian defensins molecular phylogeney of fv positive darwinian selection observed at the variable-region genes of immunoglobulins pattern of nucleotide substitution at major histocompatibility complex class i loci reveals overdominant selection identification of target amino acids that affect interactions of fungal polygalacturonases and their plant inhibitors conserved evolution of the rh gene compared to its homologous rh blood group gene positive darwinian selection after gene duplication in primate ribonuclease genes natural selection promotes divergence of transferrin among salmonid species the evolution of the type i interferon family in mammals patterns of divergence during evolution of a -proteinase inhibitors in mammals natural selection on plasmodium surface proteins recombination of hepatitis d virus rna sequences and its implications positive selection and interallelic recombination at the merozoite surface antigen- (msa- ) locus of plasmodium falciparum sequence evolution of the porb gene of neisseria gonorrhoeae and neisseria meningitidis: evidence for positive darwinian selection episodic evolution mediates interspecific transfer of a murine coronavirus positive darwinian selection on two homologous fertilization proteins: what is the selective pressure driving their divergence? positive selection and the molecular evolution of a gene of male reproduction, acp aa of drosophila reduced nucleotide variability at an androgen-binding protein locus (abpa) in house mice: evidence for positive natural selection positive selection and sequence arrangements generate extensive polymorphism in the gamete recognition protein bindin a rapidly evolving homeobox at the site of a hybrid sterility gene rapid evolution of a homeodomain: evidence for positive selection rapid evolution of a primate sperm protein: relaxation of functional constraint or positive darwinian selection? identification of regions in which positive selection may operate in s-rnase of rosaceae: implications for sallele-specific recognition sites in s-rnase evolution of sry genes nucleotide sequence evolution at the k-casein locus: evidence for positive selection within the family bovidae molecular genetics of ecological diversification: duplication and rapid evolution of toxin genes of the venomous gastropod conus accelerated evolution in the protein-coding regions is universal in crotalinae snake venom gland phospholipase a isozyme genes molecular evolution of the cox a gene family in primates molecular evolution of cytochrome c oxidase subunit iv: evidence for positive selection in simian primates evolution of hemopoietic ligands and their receptors: influence of positive selection on correlated replacements throughout ligand and receptor proteins the molecular evolution of vertebrate growth hormones: a pattern of near-stasis interrupted by sustained bursts of rapid changes antarctic fish hemoglobins: evidence for adaptive evolution at subzero temperatures natural selection and the origin of jingwei, a chimeric processed functional gene in drosophila a likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome we thank d. haydon, j. mallet, t. ohta, a. pomiankowski, v. vacquier, w. swanson and three anonymous referees for comments. we also thank several users of the paml package (http://abacus.gene.ucl.ac.uk/software/paml.html), in particular c. woelk, for comments and suggestions concerning the implementation. this work is supported by grant # /g from the biotechnology and biological sciences research council (uk). key: cord- -bmz u authors: huang, chih-cheng; lee, geng-yen; chyi, jen-inn; cheng, hui-teng; hsu, chen-pin; hsu, you-ren; hsu, chia-hsien; huang, yu-fen; sun, yuh-chang; chen, chih-chen; li, sheng-shian; andrew yeh, j.; yao, da-jeng; ren, fan; wang, yu-lin title: algan/gan high electron mobility transistors for protein–peptide binding affinity study date: - - journal: biosens bioelectron doi: . /j.bios. . . sha: doc_id: cord_uid: bmz u antibody-immobilized algan/gan high electron mobility transistors (hemts) were used to detect a short peptide consisting of amino acids. one-binding-site model and two-binding-site model were used for the analysis of the electrical signals, revealing the number of binding sites on an antibody and the dissociation constants between the antibody and the short peptide. in the binding-site models, the surface coverage ratio of the short peptide on the sensor surface is relevant to the electrical signals resulted from the peptide–antibody binding on the hemts. two binding sites on an antibody were observed and two dissociation constants, . × (− ) m and . × (− ) m, were extracted from the binding-site model through the analysis of the surface coverage ratio of the short peptide on the sensor surface. we have also shown that the conventional method to extract the dissociation constant from the linear regression of curve-fitting with langmuir isotherm equation may lead to an incorrect information if the receptor has more than one binding site for the ligand. the limit of detection (lod) of the sensor observed in the experimental result (∼ pm of the short peptide) is very close to the lod (around . – . pm) predicted from the value of the smallest dissociation constants. the sensitivity of the sensor is not only dependent on the transistors, but also highly relies on the affinity of the ligand-receptor pair. the results demonstrate that the algan/gan hemts cannot only be used for biosensors, but also for the biological affinity study. there is a great interest and increasing need for studying ligandreceptor binding affinity due to the important role in drug development (bertucci and cimitan ) , ligand selection (morrill et al. ) , antibody (kim et al. ) and nuclei acid (krusinski et al. ) developments, and understanding the mechanism of biomolecular interactions as well (bertucci and cimitan ; kim et al. ; krusinski et al. ; morrill et al. ) . a variety of methods have been developed for characterizing ligand-receptor interactions such as enzyme-linked immunosorbent assay (elisa) (orosz and ovadi ) , isothermal titration calorimetry (itc) (chiad et al. ), ultraviolet/visible light (uv/vis) spectrum (del toro et al. ) , surface plasmon resonance (spr) (homola ) , surface acoustic wave devices (saw) (dragusanu et al. ; mitsakakis and gizeli ) and quartz crystal microbalance (qcm) (liu et al. ) . the elisa (gochin et al. ) and uv/vis spectrum require probe-labeling (del toro et al. ) . itc usually needs a large sample amount (jing and bowser ) . spr has led to a commercialized equipment (biacore), which has been successfully utilized for ligand-receptor interaction study (bertucci and cimitan ) . however, the commercialized spr equipment is still expensive and the cost for research is high. on the other hand, biosensors have been successfully used for detecting biomolecules and studying ligand-receptor interactions (chen et al. ; dragusanu et al. ; lin et al. ; liu et al. ; mitsakakis and gizeli ; wang et al. ) . field-effecttransistor (fet)-based sensors have advantages such as low-cost, high sensitivity, label-free and real-time detection. si-nanowire fets, carbon nanotubes and graphene-based fets have been used to find out the dissociation constants for protein-protein interaction by using langmuir equation for a one-binding site model (chen et al. ; lin et al. ; wang et al. ) . however, a receptor may have more than one binding site. it is very important to identify the number of binding sites on the receptor and the dissociation constants between the receptor and ligands at different binding sites. to the best of our knowledge, fet sensors have not yet been used for identifying the number of binding sites on a receptor and the dissociation constants of the receptor-ligand complex, simultaneously. among various fet sensors, algan/gan hemt-based sensors have been demonstrated for gas, chemical, and bio-sensing applications (chen et al. b; kang et al. ; pearton et al. ; wang et al. ; yu et al. ) . gan-based materials are chemically stable, biocompatible, and also stable in high temperature. reliable detections of biological agents in real-time have been proved by using algan/gan hemt-based biosensors (chen et al. a; chu et al. ; kang et al. ; pearton et al. ; wang et al. ). these sensors only need a small quantity of samples and have small sizes with low manufacturing costs, thanks to the matured semiconductor microfabrication technique. in this study, anti-ferritin heavy chain (fhc) antibodies were immobilized on the algan/gan hemts for detecting the antigen (a short peptide). one-binding-site model and two-binding-site model were fitted with experimental results in the analysis of the surface coverage ratio. the surface coverage ratio is defined as the ratio of the amount of peptide-antibody complexes to that of the total antibodies immobilized on a transistor. the analysis of the surface coverage ratio reveals how many binding sites a ligand/receptor system may have and what the dissociation constants are at different binding sites for that system. our results show that the sensor cannot only detect the analytes but also can be used to study the binding affinity between ligands and receptors, simultaneously. the hemt structure consisted of a mm-thick undoped gan buffer, Å-thick undoped al . ga . n and Å-thick undoped gan cap layer. the algan layer was designed to induce high spontaneous polarization, which generated a twodimensional electron gas beneath the algan layer. the epilayers were grown by metal-organic chemical vapor deposition (mocvd) on sapphire substrates. mesa isolation was performed using an inductively coupled plasma (icp) etching system with cl /bcl gases under icp power of w at mhz and a process pressure of m torr. ohmic contacts ( Â mm ) separated with gaps of mm consisted of e-beam deposited ti/al/ni/au and was annealed at c, sec under flowing n . Å-thick gold was deposited on the gate region. photoresist of . mm (shipley s ) was used to encapsulate the source/drain regions, with only the gate region open to allow the liquid solutions to cross the surface. hydrochloride (edc) was purchased from sigma-aldrich. n-hydroxysulfosuccinimide (sulfo-nhs) was purchased from thermo fisher scientific pierce biotechnology. mm of edc and mm of sulfo-nhs were prepared in -(morpholino)ethanesulfonic acid (mes) buffer solution consisting of mm of -mercaptohexanoic acid at ph¼ and allowed min to react thoroughly at room temperature. the mixture was then titrated to increase ph value to . . the antiferritin heavy chain (fhc) antibody purchased from santa cruz biotechnology inc. (cat. #sc- ) was then added into the mixture and allowed h to react at room temperature. tris( -carboxyethyl)phosphine hydrochloride (tcep) purchased from sigma-aldrich (cat. #c ) was then added into the antibody-containing mixture, and then allowed for h to react. mm of tcep and mm of -mercaptohexanoic acid were prepared in the final mixture. the hemt devices were then submerged into this mixture and stored at c for h. this resulted in binding of the thiolated-antibody to the gold surface on the gate area of the hemts. the hemts were then rinsed with phosphate buffered saline (pbs), followed by being submerged in bovine serum albumin (bsa) solution for h for surface blocking. after incubation, the sensor was thoroughly rinsed off with pbs and dried by a nitrogen blower. fig. show the schematics of the antibody-immobilized algan/gan hemt sensor and the plan-view microphotograph of the device, respectively. the characteristics of source-drain current-voltage of the sensor was measured at c using an agilent b parameter analyzer with the gate region exposed. the source-drain bias was fixed at . v. the antigen, a short peptide, consisting of amino acids was purchased from santa cruz biotechnology inc. (cat. #sc- p). this peptide can specifically bind to the fhc antibody. different concentrations of the peptides were sequentially dropped on the sensor in a real-time detection. a background test was also conducted with different concentrations of the short peptide at a constant bias of . v. the chip for the background test was processed with the similar surface modification procedure ( -mercaptohexanoic acid immobilized and bsa blocking) with the real sensor but no antibodies were immobilized on the chip. was initially dropped on the sensor. when an additional pbs solution was added on the sensor, an abrupt deep peak appeared and then quickly recovered to the baseline. this abrupt peak occurred due to the mechanical disturbance resulted from dropping the pbs droplet with a micropipette by hands. after the current went back to the original baseline, there was no net current change observed around s. when the target concentration of . pm of the peptide was dropped onto the surface of the gate area, there was no significant current change. in sharp contrast, when the target concentration of pm of the peptide was added, a clear current change was observed as the system reached a steady state. real-time current monitoring spanned the range of target concentrations from . pm to mm of the peptide. upon these target concentrations of the peptide, the pm gave a most significant current change. the current change gradually saturated as the peptide concentration increased. eventually, no more current change increase was observed at high concentration of the peptide, as shown in fig. (a) . the background test (control experiment) was conducted with different concentrations of the short peptide showing no significant current change which demonstrated that there was no any non-specific binding on the chip, as shown in fig. (b) . therefore, the signals generated from the sensors were attributed to the specific antibody-antigen interaction. fig. (a) and (b) show the current change versus the target concentration of the peptide in linear scale and in log scale, respectively. from fig. (b) , the limit of detection of this sensor was estimated about pm of the peptide in the buffer solution. the chemical reaction for the surface-immobilized receptor (antibody) and the free ligand (antigen: the peptide) in bulk solution can be expressed as the following equations. where k is the equilibrium constant for formula ( ), k a is the association constant, and k d is the dissociation constant. [ab] is the concentration of the unbound antibody immobilized on the sensor surface. [ag] is the antigen concentration in the bulk solution. [ab-ag] is the concentration of the antibody-antigen complex on the sensor surface. to estimate the dissociation constant, the langmuir isotherm equation was used to extract the dissociation constant. here the current changes were introduced to the langmuir equation for calculating the dissociation constant. where di is the current change at the bulk concentration of antigen [ag] , and di max is the saturated current change. the dissociate constant can be extracted from the linear regression by using eq. ( ), as shown in fig. (a) . the y-axis and the x-axis are [ag]/di and [ag], respectively. the slope and the y-intercept are =di max and k d =di max , respectively. the linefitting shown in fig. gives a very good linear regression correlation coefficient (r ¼ . ). the extracted maximum current change was about . ma, which is very close to the experimental observed one, . ma. the extracted dissociation constant was obtained as . Â À m, which is also in the reasonable range for most igg antibody-antigen complexes (dragusanu et al. ; kim et al. ; mitsakakis and gizeli ) . the langmuir equation shown in eq. ( ) can be rearranged as the following expression. the right-hand side of the eq. ( ) is equal to the ratio of antibody-antigen complex concentration [ab À ag] to the total antibody concentration ½ab max . here this ratio is defined as the surface coverage ratio a, as shown in eq. ( ). here the total antibody concentration is the sum of the unbound antibody concentration and the antibody-antigen complex concentration. the equation is shown as ½ab max ¼ ½abþ½abÀag. fig. (b) shows the curve of surface coverage ratio from eq. ( ) using the dissociation constant extracted from fig. (a) and the di=di max obtained from the experimental results shown as dots versus the antigen concentration in log scale. it is obvious that the curve of the surface coverage ratio does not match well with those experimental dots. from eq. ( ), it is shown that the surface coverage ratio actually depends on the ratio of dissociation constant to the concentration of antigen, that is k d /[ag]. when k d equals to folds or % of [ag], the surface coverage ratio goes to around . % and . %, respectively. beyond . % or below . % of the surface coverage ratio, increasing or decreasing the concentration of antigen does not change the surface coverage ratio much. therefore, we conclude that for a one binding-site model, the most significant change of the surface coverage ratio is within the range of the antigen concentration between one order higher and one order lower than the value of the dissociation constant. this was also shown by experimental data in literature (maehashi et al. ). thus, ideally once the dissociation constant is determined, the highest sensitivity for the sensor is only within this two-order range of antigen concentration. on the other hand, for any experimental result, the surface coverage ratio between . % and . % should only cover two orders of antigen concentration for an ideal one-binding site model. however, in fig. (b) , it is significant that between . % and . % of the surface coverage, the experimental data points cover almost orders of the antigen concentration. therefore, it is not possible to fit these experimental points well into the surface coverage ratio with the one-binding site model. the wide range of the antigen concentration within that coverage ratio ( . - . %) may be resulted from multiple binding-sites of the receptor. compared with fig. (a) and (b) is clearly more sensitive to see whether the model using the extracted dissociation constant can really fit into the experimental data. thus, the surface coverage ratio versus the antigen concentration will be used for finding out the number of binding sites and the associated dissociation constants. here a twobinding-site model is proposed as the following. in the two-binding-site model, the chemical equations are described as below, where the k d and k d are the dissociation constants for the two binding sites on a receptor. the total current change is assumed to be the sum of the current change resulted from antibody-antigen complex at site and at site , respectively, as shown in eqs. ( ) and ( ). where a and a are the surface coverage ratio at site and site of the antibody. if we assume antigens bind at the two different binding sites of an antibody equally affect the conductance of the transistor, it is quite reasonable to allow di max ¼ di max ¼ di max = . then the total surface coverage ratio a can be expressed as the average of the two individual ones (a and a ) at the two binding sites as shown in eq. ( ). we then fit this two-binding-site model into the experimental data points to find out the two dissociation constants, k d and k d . fig. (a) shows the average surface coverage ratio with error bars (standard deviation) from three measurements versus the antigen concentration. the model and the experimental data points perfectly fit together. the k d and k d are . Â À m and . Â À m, respectively. the correlation coefficient, r is . . these two binding constants are in a reasonable range of regular antibodyantigen binding constants (kim et al. ; mitsakakis and gizeli ) . the antigen, the peptide, consisting of only amino acids is pretty small and the antibody is allowed to bind two antigens on its two binding sites. because the two binding sites of an antibody are usually regarded as identical structures, the site and site in this reaction should not be recognized as two different sites. instead, they are more preferably regarded as the binding sequence with the antigens. the difference between these two dissociation constants is probably ascribed to the stereo hindrance resulted from the first antigen-antibody complex for the second one. it may be arguing that whether the assumption of equal maximum current changes for the two binding complexes is appropriate or not. if di max and di max are allowed to be variables and eq. ( ) is used for the curve fitting (assuming di max ¼ c di max and di max ¼ c di max ¼ Àc ð Þ di max ), we can get c ¼ . , c ¼ . , k d ¼ . Â À m m, k d ¼ . Â À m and r ¼ . . fig. (b) shows the average surface coverage ratios derived from this non-equal maximum current changes for the two binding complexes. it is widely known that antibodies have two almost identical binding sites, as our experimental results predict. therefore, this technique is adequate to determine the number of the binding sites of a receptor for its analyte. these two dissociation constants have less than one order of magnitude shift compared with the ones derived from the equal maximum current change for the two binding complexes. what happened if the high sensitivity region covers $ orders of magnitude? in that case, we may further need to consider a threebinding-site model. now when we look back at the detection limit of the sensor, which is between - pm of the antigen is just consistent with the lower-half high sensitivity region of k d (between one order lower and the order of the dissociation constant). this result shows that the algan/gan hemts are still very sensitive in low antigen concentration. however, the detection limit for the antigen of this sensor is not depending on the transistor, but on the affinity of the antibody-antigen complex, that is, depending on its dissociation constant. in summary, algan/gan hemts immobilized with antibodies can detect a short peptide consisting of amino acids in a low detection limit ($ pm) which is proved to be depending on the dissociation constants between the short peptide and the antibody. it has been found that directly using the conventional linear regression of langmuir isotherm equation to extract the dissociation constant may lead to incorrect dissociation constants due to ignoring the number of binding sites on the receptor for the ligand. instead, through the analysis of surface coverage ratio versus the ligand concentration in the binding-site model, the number of binding sites of a receptor and the dissociation constants can be determined. in this study, two binding sites of the fhc antibody are predicted and two dissociation constants, . Â À m and . Â À m, are extracted by the analysis of experimental results with the binding-site model. the algan/gan hemts and the analysis of the surface coverage ratio are demonstrated to be good tools for the biological affinity study. our future works will be utilizing this platform to study dna-protein and protein-drug interactions such as the applications for dna-sars (severe acute respiratory syndrome) virus protein package and hiv (human immunodeficiency virus) drug development. another future work is the advance toward to the binding-site models for different ligands competing with one another for the same receptors on our hemt-based sensors. proceedings of the national academy of sciences this work was partially supported by national science council grant (no. b a) and by the research grant ( n e ) at national tsing hua university. we thank dr. jen-inn chyi and dr. hui-teng cheng for providing us the algan/gan epi wafers and the antibody, respectively, and for consulting as well. supplementary data associated with this article can be found in the online version at http://dx.doi.org/ . /j.bios. . . key: cord- - xpnd d authors: strömich, léonie; wu, nan; barahona, mauricio; yaliraki, sophia n. title: allosteric hotspots in the main protease of sars-cov- date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: xpnd d inhibiting the main protease of sars-cov- is of great interest in tackling the covid- pandemic caused by the virus. most efforts have been centred on inhibiting the binding site of the enzyme. however, considering allosteric sites, distant from the active or orthosteric site, broadens the search space for drug candidates and confers the advantages of allosteric drug targeting. here, we report the allosteric communication pathways in the main protease dimer by using two novel fully atomistic graph theoretical methods: bond-to-bond propensity analysis, which has been previously successful in identifying allosteric sites without a priori knowledge in benchmark data sets, and, markov transient analysis, which has previously aided in finding novel drug targets in catalytic protein families. we further score the highest ranking sites against random sites in similar distances through statistical bootstrapping and identify four statistically significant putative allosteric sites as good candidates for alternative drug targeting. for allosteric regulation of the main protease. by providing guidance for allosteric drug design we hope to open a new chapter for drug targeting efforts to combat covid- . results the first step in our graph analysis approach is the construction of an atomistic graph from a protein data bank (pdb) [ ] structure. this process takes into account strong and weak interactions like hydrogen bonds, electrostatic and hydrophobic interactions (see methods and fig. ) . additionally, we can incorporate water molecules, which in the case of the m pro are catalytically important and known to expand the catalytic dyad to a triad [ ] (see fig. s b ). in sites) or form a communication pathway [ ] . by applying quantile regression we are able to quantitatively rank all bonds, atoms and subsequently residues. this allows to score the hotspots we identified and statistically prove their significance. table s ) reveal two main areas of interest in the m pro . the hotspot on the back of the monomer opposite to the active site ( fig. a) is described in more detail in the paragraphs below. hotspot two is located in the dimer interface and contains four residues which form salt bridges between the two monomers. serine and arginine from one monomer connect to histidine and glutamine from the other one, respectively. interestingly, these bonds have been found to be essential for dimer formation which in turn is required for m pro activity [ , ] . to further clarify the interactions between the dimer halves ( hence, we chose these residues as source when looking into pro- tease dimer connectivity in comparison between sars-cov- and sars-cov. in sars-cov, this closer dimer packing led to an increased activ- figure : bond-to-bond propensities of m pro sourced from the orthosteric sites. the source sites have been chosen as the catalytically active residues his and cys in both chains of the homodimer and are shown in green (front a) and top b) view). all other residues are coloured by quantile score as shown in the legend and reveal two main areas of interest with important residues labelled. c) the propensity of each residue, ⇧r, is plotted against the residue distance from the orthosteric site. the dashed line indicates the quantile regression estimate of the . quantile cutoff used for identifying relevant residues. atomistic level here, we assume that studying the dimer interface residues in a systematic manner would help elucidate the link between domain iii and the catalytic activity of the m pro . bond-to-bond propensities have been shown to successfully detect allosteric sites on proteins [ ] and we here present the results in the sars-cov- m pro to that effect. by choosing the active site residues histidine and cysteine as source, we can detect areas of strong connectivity towards the active centre which allows us to reveal putative allosteric sites. we could detect two hotspots on the protease which might be targetable for allosteric regulation of the protease (fig. ) . most of the residues present in the two putative sites are amongst the highest scoring residues which are listed in table s . site ( fig. a shown in yellow) which is located on the back of the monomer in respect to the active site and is formed by nine residues from domain i and ii (full list in table s ). the second hotspot identified with bond-to-bond propensities is located in the dimer interface and contains residues (tab. s ) which are located on both monomers ( fig. b shown in pink). two of these residues, glu and arg of the respective second monomer, are forming a salt bridge which is essential for dimerisation [ ] . quantile regression allows us to rank all residues in the protein and thus we can score both sites with an average residue quantile score as listed in table . site and have a high score of . and . , respectively and score much higher than a randomly sampled site would score with . ( % ci: . - . ) for a a site of the size of site or . ( % ci: . - . ) for a site of the size of site . our methodologies further allow to investigate the reverse analysis to assess the connectivity of the predicted allosteric sites. for this purpose, we defined the source as all residues within the respective identified sites (tables s and s ). after a full bond-to-bond propensity analysis and quantile regression to rank all residues, we are able to score the active site to obtain a measure for the connectivity towards the catalytic center (tab. s ). for site the active site score is . which is above a randomly sampled site score of . ( % ci: . - . ). however, for site the active site score is . which is only marginally above a randomly sampled site score of . ( % ci: . - . ). as site is located in the dimer interface, this is in line with the above described suggestion that the allosteric effect is not directly conferred from the dimer interface towards the catalytic centre. nonetheless, this site might provide scope for inhibiting the m pro by disrupting the dimer formation at these sites. figure : putative allosteric sites identified by bond-to-bond propensities. surface representation of the m pro dimer coloured by quantile score (as shown in the legend). a) rotated front view with site (yellow) which is located on the opposite of the orthosteric site (coloured in green). b) top view with site (pink) located in the dimer interface. a detailed view of both sites is provided with important residues labelled. overall, this missing bi directional connectivity hints to a more complex communication pattern in the protein and gave us reason to utilize another tool which has been shown to be effective in catalytic frameworks [ ] like the protease. figure a and a full list can be found in table s . in the sars-cov- m pro , this analysis subsequently led to the discovery of two more putative sites as shown in figure c . both hotspots are located on the back of the monomer in relation to the active site. site (shown in turquoise in figure c ) is located solely in domain ii and consists of ten residues as listed in table s . one of which is a cysteine at position which might provide a suitable anchor point for covalent drug design. site (orange in figure c ) is located further down the protein in domain i with residues as listed in table s . both sites were scored as described above and in the methods section. following the same thought process as described for site and , we can investigate the protein connectivity from the opposite site by sourcing our runs from the residues in site and . we then score the active site to measure the impact in multimeric proteins this might be due to another structural or dynamic factor which we did not yet uncover between site and the active site. overall we see a similar pattern of hot and cold spots in the sars-cov m pro (results not shown). we find a high overlap for the identified four sites which gives us confidence, that a potential drug effort would find applications in where b is the n ⇥ m incidence matrix for the atomistic protein graph with n nodes and m edges; w = diag(w ij ) is an we define the bond propensity as: and then calculate the residue propensity of a residue r: markov transient analysis (mta). a complementary, node-based method, markov transient analysis (mta) identifies areas of the protein that are significantly connected to a site of interest, the source, such as the active site, and obtains the signal propagation that connects the two sites at the atomistic level. the method has been introduced and discussed in detail in ref. [ ] and has successfully identified allosteric hotspots and pathways without any a priori knowledge [ , ] . importantly, it captures all paths that connect the two sites. the contribution of each atom in the where t provides models for conditional quantile functions. this is significant here because it allows us to identify not the "average" atom or bond but those that are outliers from all those found at the same distance from the active site and because we are looking at the tails of highly non-normal distributions. as the distribution of propensities over distance follows an exponential decay, we use a linear function of the logarithm propensities can be found in ref. [ ] and for markov transient analysis in ref. [ ] . site scoring with structural bootstrap sampling. to allow an assessment of the statistical significance of a site of interest, we score the site against randomly sampled sites of the same size. for this purpose, the average residue quantile score of the site of interest is calculated. after sampling random sites on the protein, the average residue quantile scores are calculated. by performing a bootstrap with , resamples with replacement on the random sites average residue quantile scores, we are able to provide a confidence interval to assess the statistical significance of the site of interest score in relation to the random site score. investigation as shown in table . for each of these fragment-bound structures, we performed bond-to-bond propensity and markov transient analyses to evaluate the connectivity to the active site. the active site was scored as described above. a pneumonia outbreak associated with a new coronavirus of probable bat origin a new coronavirus associated with human respiratory disease in china a novel coronavirus from patients with pneumonia in china the species severe acute respiratory syndrome-related coronavirus: classifying - ncov and naming it sars-cov- the severe acute respiratory syndrome a decade after sars: strategies for controlling emerging coron- aviruses dissection study on the severe acute respiratory syndrome c-like protease reveals structure-based prediction of protein allostery allosteric modulator discovery: from serendipity to structure-based design activation pathway of src kinase reveals intermediate states as targets for drug design perturbation-response scanning reveals key residues for allosteric control in hsp exploiting protein flexibility to predict the location of allosteric sites pars: a web server for the prediction of protein allosteric and regulatory sites allopred: prediction of allosteric pockets on proteins using normal mode pertur- bation analysis improved method for the identification and validation of allosteric sites structure-based statistical mechanical model accounts for the causality and energetics of allosteric communication reversing allosteric communication: from detecting allosteric sites to inducing and tuning targeted allosteric response mapping allosteric communications within individual proteins protein multi-scale organization through graph partitioning and robustness analysis: application to the myosin-myosin light chain interaction uncovering allosteric pathways in caspase- using markov transient analysis and multiscale community detection bagpype: a python package for the construction of atomistic, energy-weighted graphs from biomolecular structures prediction of allosteric sites and mediating interactions through bond-to-bond propensities allostery and cooperativity in multimeric proteins: bond- to-bond propensities in atcase the origin of allosteric functional modulation: multiple pre-existing pathways abstract : targeting rsk prevents both chemoresistance and metastasis in lung cancer the protein data bank sars-cov cl protease cleaves its c-terminal autoprocessing site by novel subsite cooperativity quaternary structure of the severe acute respiratory syndrome (sars) coronavirus main protease crystallographic and electrophilic fragment screening of the sars-cov- main protease potential anti-viral activity of approved repurposed drug against main protease of sars- cov- : an in silico based approach silico evaluation of the effectivity of approved protease inhibitors against the main protease of the novel sars-cov- virus targeting the dimerization of the main protease of coronaviruses: a potential broad- spectrum therapeutic strategy targeting non-catalytic cysteine residues through structure-guided drug discovery inference of macromolecular assemblies from crystalline state proteinlens: a web-based application for the analysis of allosteric signalling on atomistic graphs of biomolecules asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation inorganic chemistry: principles of structure and reactivity dreiding: a generic force field for molecular simulations automated design of the surface positions of protein helices hydrophobic potential of mean force as a solvation function for structure of complex networks: quantifying edge-to-edge relations by failure-induced flow redistribution algebraic graph theory random walks, markov processes and the multiscale modular organization of complex networks quantile regression quantreg: quantile regression. r package version exploring allostery in proteins with graph theory open-source foundation of the user-sponsored pymol molecular visualization system key: cord- -melttpiq authors: yu, tian-fei; shao, shu-li; xu, xing-jun; lv, jian-wei; li, ming title: express transmissible gastroenteritis virus spike gene b and c antigen sites in multiple expression systems date: journal: information technology and agricultural engineering doi: . / - - - - _ sha: doc_id: cord_uid: melttpiq in order to illuminate the antigenicity of porcine transmissible gastroenteritis virus (tgev) spike protein b and c antigen sites, the truncated spike gene including b and c antigen sites of chinese isolate th- was expressed respectively in e.coli, baculovirus and pichia pastoris expression systems. dot enzyme-linked immunosorbent assays (dot-elisa) based on these three recombinant proteins were developed preliminarily. ten sera obtained correspondingly from ten piglets two months old which showed up clinical symptom were used for examination. the study indicates that the assays are rapid, reliable and sensitive and it has the potential for use as serological methods for tgev diagnosis. transmissible gastroenteritis (tge) is a highly contagious viral disease of swine characterized by vomiting, diarrhea, and dehydration. its causative agent is transmissible gastroenteritis virus (tgev), considered the principal etiologic agent responsible for dramatic outbreaks of diarrhea and high mortality of newborn pigs, in which mortality approaches % [ ] [ ] . porcine respiratory coronavirus (prcv) is believed to be a mutant of tgev, as it has been shown to be genetically related to tgev but has a selective tropism for respiratory tissue with very little to no replication in the intestinal tissue of infected swine [ ] . tgev and prcv share high homology in genome and can generate the full cross-reaction neutralizing antibody [ ] [ ] [ ] , so it is difficult to discriminate tge in clinical diagnosis with traditional serological methods. the truncated spike gene including b and c antigen sites which is absent in prcv was expressed in e.coli, baculovirus and pichia pastoris in this research. dot-elisa assays based on these three recombinant proteins were developed to detect tgev antibodies and could avoid antibody cross-reaction from prcv theoretically. tgev strain named th- was isolated from a suburb of harbin, heilongjiang province, p.r. china, and swine testicle (st) cell line was grown as monolayer in dulbecco's modified eagle medium (dmem) (gibco, usa) containing % fetal calf serum (gibco, usa) and % co in air. viruses were harvested by three cycles of freezing and thawing, cellular debris was removed by low speed centrifugation at . ×l g (hitachi cr e, japan) at ℃ for min, and virions in supernatant were pelleted by centrifugation at . × g at ℃ for . h (hitachi cr e, japan). all the three pairs of primers were prepared according to the sequence of tgev strain th from a data base, genbank (accession no.af ). pgu/pgl contained ecorⅠ or salⅠ restriction enzyme site respectively, phu/phl contained pstⅠ or salⅠ restriction enzyme site respectively, and pyu/pyl contained ecorⅠ or notⅠ restriction enzyme site respectively. all the primers contained artifical start codon or termination codon (table. ). rna was extracted as described by yin et al [ ] , meanwhile the st cell rna was also extracted as the negative control. extracted rna μl was added into the below components: ×reverse transcription buffer μl, dntp mixture ( . mm) μl, rnase inhibitor . μl, primer pyl μl, amv reverse transcriptase μl ( u), sterile water . μl, gently mixed in an eppendorf tube and incubated at room temperature for min, then transferred to a water incubator at ℃ for l h prior to stored at - ℃ until use in pcr. pcr amplification was performed using pe pcr equipment (usa). pcr was in μl volumes, using μl of ×buffer ( mm kc , mm tris-cl ph . , mm mgcl , . % gelate, μl dntp mixture ( mm), μl cdna, μl of primers and lμl extaq polymerase ( . u takara, dalian, china). for the pcr the mixture was submitted to cycles of amplification involving heating at ℃ for s, ℃ (e.coli expression), . ℃ (baculovirus expression), ℃ (pichia pastoris expression) for s, and ℃ for s, there was then a final extension time of min at ℃. a μl aliquot of pcr product was visualized by agarose gel electrophoresis ( % agarose, vs for min, . μg/ ml ethidium bromide included in gel) and subsequent u.v transillumination. the purified pcr product was named ts, tg or ty respectively. the amplified s gene (ts) dna was digested with ecorⅠ, salⅠ and the resulting fragment was inserted into the ecorⅠ, salⅠ sites of the vector pgex- p- (pharmacia biotech, inc., usa) to place the cdna under the control of the tac promoter. the amplified s gene (tg) dna was digested with pstⅠ, salⅠ and the resulting fragment was inserted into the pstⅠ, salⅠ sites of the transfer vector pbluebachis a (invitrogen co., usa) to place the cdna under the control of the polyhedrin promoter. the amplified s (ty) gene dna was digested with ecorⅠ, notⅠ and the resulting fragment was inserted into the ecorⅠ, notⅠ sites of the transfer vector ppic k (invitrogen co., usa). the three recombinant vectors were sequenced by sangon bio-company, shanghai, china. expression of b and c antigen sites in e.coli and recombinant proteins purification was followed the procedures offered by the description of glutathione s-transferase (gst) gene fusion system (amersham pharmacia biotech, third edition, revision ). expression of b and c antigen sites in baculovirus and pichia pastoris were followed manual of bac-n-blue transfection and manual of expression guide and methods for expression of recombinant proteins in pichia pastoris (invitrogen co., usa). twelve percent sds polyacrylamide gel was used to analyze the three recombinant proteins. after electrophoresis, one of the gels was stained with coomassie brilliant blue r to visualize the protein bands. the proteins of the other gel were transferred onto the nc membrane for the western blot analysis. the diluted protein samples were spotted on the nc membrane for the dot-elisa. the membrane was blocked with % non-fat dried milk powder in pbs with . %tween (blocking solution). the membrane was probed with the tgev immunized rabbit serum, in blocking solution ( : ). a peroxidase conjugated sheep anti-rabbit igg (promega, usa) was used as the secondary antibody ( : ) and the signal was detected with h o and -chlor- -naphthol as a chromogenic substrate. Ⅰ illustrate). this was consistent with the expected molecular mass of the fusion protein of pgex- p-ts which consisted of the gst ( kd) and s gene b and c antigen sites subunit ( kd). two different mois, and were tested. seed two -well plates with cells in each well. the final volume in each well should be - . ml. the harvest (post infection) was, hours, hours, hours, and hours. as shown in fig. . illustrate, recombinant baculovirus expressed kd protein corresponded to the molecular weight of tg of tgev at hours post-infection and the protein was accumulated in high amount till hours post-infection. no specific band was detected in the culture medium by coomassie brilliant blue staining. sf cells were inoculated recombinant virus with mois, . infected cells were harvested after h. the cell sediments were resuspended in pbs (ph . ). after interruptible ultrasonic treatment, the lysate and supernatant were analyzed by sds-page. the result showed that recombinant protein was soluble bulk in supernatant (the result was not given). the recombinant b and c antigen sites protein expressed into the yeast culture supernatant was identified on the bases of its molecular weight. numerous bands were observed in the - kd molecular mass range. a sharp band was observed at molecular masses of approximately kd by sds-page. the recombinant protein is the major protein component observed in the culture supernatant ( fig. . Ⅳ illustrate). the antigenicity of the three recombinant proteins was analyzed by dot-elisa assay. when the amount of spotting is ng, the recombinant protein expressed in prokaryotic system shows the positive reaction in contrast with gst ( fig. . Ⅰ illustrate). when the amount of spotting is ng, the ultrasonic lysis supernatant of sf cell infected by recombinant baculovirus show the positive reaction in contrast with ultrasonic lysis supernatant of sf cell (fig. . Ⅱ illustrate). when the amount of spotting is ng, the recombinant b and c antigen sites protein expressed into the yeast culture supernatant show the positive reaction in contrast with the gs cells transformed with ppic k plasmids (fig. . Ⅲ illustrate). . analysis of the antigenicity of recombinant protein expressed in pichia pastoris expression system. dots a ( ng), a ( ng), a ( ng), a ( . ng) were the positive yeast culture supernatant; dots b ( ng), b ( ng), b ( ng), b ( . ng) were the gs cells supernatant transformed with ppic k plasmids. ten field sera obtained from ten piglets about two months old which showed typical signs of epizootic tge used for examination and the ten corresponding sera were positive detected by dot-elisa (fig. ) . the spike protein of tgev has been shown to contain four major antigenic sites (a, b, c, and d). site a is the main inducer of neutralizing antibodies and has been previously subdivided into the three subsites aa, ab, and ac. site a contains the residues , , and , which are essential in the formation of subsites aa, ab, and ac, respectively. the peptide -mksgygqpia- represents, at least partially, subsite ac which is highly conserved among coronaviruses. this site is relevant for diagnosis and could be of interest for protection. other residues contribute to site b (residues and ), site c (residues and ), and site d (residue ). site c can be represented by the peptide -p-p/s-n-s-d/e- but is not exposed on the surface of native virus [ ] . site b is dependent on intracellular glycosylation and is complex and conformation-dependent. this site is formed by at least three epitopes. although site b is conformation dependent, mabs specific for this site can bind tgev spike protein by immunoblotting providing that the samples were not treated with -mercaptoethanol. most probably, renaturation of spike protein occurs during the blotting of the protein to nitrocellulose paper [ ] . site c is linear and continuous. it is recognized by mabs in western blot analysis after treatment of the virus with . % sds and % -mercaptoethanol; it is represented by synthetic nonapeptides derived from tgev spike protein [ ] ; it is present in recombinant products expressed in bacteria, which do not reconstitute the native spike protein [ ] ; and it is formed in the absence of glycosylation [ ] . in addition, because binding and sequencing studies indicate that site c is not present in the respiratory variants of tgev, this peptide could be useful to discriminate serum from tgev or prcv infected animals. the purified recombinant protein expressed in prokaryotic system in this research, can be recognized specially by polyclonal antibody according to the research of yang et al [ ] . although, the three recombinant proteins shared the same antigen sites, the quantity of amino acids of the former exceed the latter (the recombinant protein expressed in eucaryotae has amino acids, the recombinant protein expressed in prokaryosyte has amino acids). if this difference has an effect on antigenicity between them or how deeply effect on antigenicity need more experiment to identify. recently, some new ways to detection tgev were developed, such as real-time rt-pcr [ ] [ ] , but these ways need expensive equipment and reagent. it seems to unsuitable applied in open country, especially in the third world countries. so we developed the simple ways to detecting antibody induced by tgev, and the results seem more reliable. the study indicates that the assay reported above is rapid, reliable and sensitive and it has the potential for use as serological method for tgev diagnosis. transmissible gastroenteritis virus infection: a vanishing spectre complete sequence ( kilobases) of the polyprotein-encoding gene of transmissible gastroenteritis virus isolation of a porcine respiratory, non-enteric coronavirus related to transmissible gastroenteritis antigenic variation among transmissible gastroenteritis virus (tgev) and porcine respiratory coronavirus strains detected with monoclonal antibodies to the s protein of tgev detection transmissible gastroenteritis virus by rt-pcr and differentiation from porcine respiratory coronavirus field isolates of transmissible gastroenteritis virus differ at the molecular level from the miller and purdue virulent and attenuated strains and from porcine respiratory coronavirus molecular cloning and phylogenetic analysis of orf region of chinese isolate th- from transmissible gastroenteritis virus residues involved in the antigenic sites of transmissible gastroenteritis virus s glycoprotein localization of antigenic sites of the e glycoprotein of transmissible gastroenteritis coronavirus comparsion of antigenicity between expressed proteins of the fragment including s gene whole antigenic sites and the deleted fragment in porcine respiratory coronavirus of transmissible gastroeritis virus development of a novel real-time rt-pcr assay with lux primer for the detection of swine transmissible gastroenteritis virus a real-time taqman rt-pcr assay with an internal amplification control for rapid detection of transmissible gastroenteritis virus in swine fecal samples key: cord- -e zh authors: koide, takashi; chiba, daiki; akiyama, mitsuaki; yoshioka, katsunari; matsumoto, tsutomu title: it never rains but it pours: analyzing and detecting fake removal information advertisement sites date: - - journal: detection of intrusions and malware, and vulnerability assessment doi: . / - - - - _ sha: doc_id: cord_uid: e zh fake antivirus (av) software is a serious threat on the internet to make users install malware and expose their personal information. fake removal information advertisement (frad) sites, which introduce fake removal information for cyber threats, have emerged as platforms for distributing fake av software. although frad sites seriously threaten users who have been suffering from cyber threats and need information for removing them, little attention has been given to revealing these sites. in this paper, we propose a system to automatically crawl the web and identify frad sites. to shed light on the pervasiveness of this type of attack, we performed a comprehensive analysis of both passively and actively collected data. our system collected , frad sites in languages, which have . million visits per month in total. we show that frad sites occupy search results when users search for cyber threats, thus preventing the users from obtaining the correct information. antivirus (av) software is one of the basic defense strategies for protecting users' devices. the major av software market was valued at , million usd in [ ] , and attackers focus on the needs of such pervasive av software to gain financial benefits. specifically, fake av software, which are rogue applications disguised as legitimate av software, is used to manipulate users' devices and steal money or sensitive information [ , ] . for example, once fake av software is installed, the software displays fake virus scan results to get users to purchase additional licenses [ , ] . fake av software is a traditional cyber threat that can effectively spread malware and unwanted software on the web [ , ] . to infect users and gain more profit, attackers take advantage of online advertisements that target many people to distribute fake av software [ ] . the web pages served by these advertisements typically show fake virus infection alerts or messages claiming the necessity of installing their software. these web pages also attract users with promises of speeding up their machines [ ] . attackers use such social engineering techniques that exploit users' psychological vulnerabilities to lure users to download fake av software. these web pages are known to be major distribution paths for fake av software [ , , ] . in this paper, we focus on new techniques that psychologically encourage users to install fake av software from the web. attackers create web pages that introduce fake information for handling specific cyber threats, such as malware infection or visits to malicious web pages, and suggest fake av software. we call these web pages fake removal information advertisement (frad) sites, which target users who have already suffered from security problems and which make them victims of another one. for example, users who notice their malware infection try to search for removal information using the malware detection names given by virus scanners, and they reach the frad sites from search results. believing the frad information, the users follow the instructions and inadvertently install the suggested fake av software. although it is well known that attackers induce users to install fake av software using scaring or attracting messages-such as fake infection alerts or promises to speed up their machineslittle attention has been given to analyzing the frad sites. here, we propose a system that automatically crawls the web pages and detects frad sites. using the linguistic and visual features of the web pages, we accurately identify frad sites with . % true positives and only . % false positives. we used our system for a large-scale collection of frad sites and found , distinct domain names of frad sites written in languages. the total user accesses to these frad sites was . million visits per month. we observed that these frad sites are not adequately reported by existing blacklists. to reveal the ecosystem of frad sites, we performed a measurement study using both passively collected statistical data on user accesses and actively crawled data. we first investigated the incoming traffic to frad sites to determine what types of user behaviors are at risk of reaching frad sites. we found that many users not only accessed these sites from search engines directly but also reached frad sites from videos or messages posted on social media by attackers' accounts. to determine what kinds of attacks users encounter from frad sites, we then analyzed the transferred web pages and downloaded files from the frad sites. we confirmed that the frad sites led to fake av software families by directly distributing installers and luring users to payment and distribution sites. also, we investigated search results for the names of specific cyber threats, and we found that . % of the top search results were occupied by frad sites. in other words, search results for information concerning cyber threats are poisoned by frad sites, making it difficult for users to obtain correct removal information. to the best of our knowledge, this is the first study that has revealed the prevalence and ecosystem of frad sites. overview of fake av software distribution via frad sites. users that require removal information for cyber threats access frad sites via a web search (e.g., search engines or social media) ( ). they click on download buttons on the frad sites and are navigated to software distribution sites ( ) . they download fake av software from these sites ( ') or from the frad sites ( ) directly. then, they make the damage even worse by installing the fake av software ( ). in summary, our contributions are as follows: -we propose a system to crawl the web and detect frad sites automatically. by extracting linguistic and visual features from crawled web pages, our system detected frad sites with . % true positives and . % false positives. -we performed a large-scale collection of frad sites on the web by leveraging a search engine, which is the most common channel used to reach frad sites. using our system, we discovered , domain names of frad sites written in languages. we found that attackers widely deploy frad sites targeting users in various countries to increase the number of page views. -we conducted a comprehensive measurement study using both passively collected statistics data and actively crawled data to reveal the ecosystem of frad sites. our measurement study also clarified the typical incoming channels employed by users to reach frad sites and the types of potential threats directed from the frad sites. we also found that it is difficult for users who need removal information for specific cyber threats to reach correct information, because most of the search results concerning cyber threats are poisoned by the frad sites. we first consider an attack technique for distributing fake av software via frad sites. the purpose of the frad sites is to deceive users who need ways to deal with cyber threats, i.e., malicious acts that damage the users' devices and steal their sensitive information. examples of cyber threats include malware infection, fraudulent popup messages, and malicious browser extensions. attackers post multiple entries on frad sites that introduce fake threat removal guides, using the names of specific cyber threats, such as malware detection names or the domain names of malicious sites. for instance, there can be more than k entries in a single frad site, and dozens of new entries are added to the frad site every day. when users notice that they have security issues by looking at the results from legitimate virus scanners or from suspicious alert messages on web pages, they search for information to remove them. users who reach frad sites and are deceived by false information install fake av software, which makes matters worse. we focus on such scams on the web in this paper. figure shows an overview of the distribution of fake av software via frad sites. first, users who have security problems reach frad sites by searching for the specific names of cyber threats they want to remove ( ) . attackers leverage search engine optimization (seo) techniques that target specific names of cyber threats to increase the web traffic to frad sites. attackers also post fake videos on youtube that introduce ways to remove the threats, and they post similar articles on facebook and other social media to lure users to click on links to frad sites. forum and community sites where anyone can post messages are also used by the attackers in the same manner. thus, users not only visit frad sites from results provided by search engines but also reach frad sites through social-media postings and other web pages hit by the search results. the frad sites contain detailed fake removal guides for individual threats as well as large buttons or banners to direct users to fake av software. the frad sites usually display the logos of famous security vendors or third-party organizations (e.g., software certification companies) to make them look as if they are legitimate web pages. users who click on the buttons or banners are navigated to software distribution sites ( ) . most of the software distribution sites use domain names containing the names of the fake av software and disguise themselves as official sites for legitimate av software by displaying product information and purchase menus. these sites are also reachable through search engines and even provide customer support such as web chats or toll-free calls. on these web pages, users follow the payment and download instructions and then obtain fake av software installers ( '). these installers can also be downloaded from the frad sites directly ( ) . users install the fake av software and thus become victims of other cyber threats ( ) . some social engineering techniques are already known, such as threatening users using fake infection alerts or attracting them by the prospect of improving computer performance. however, it has not been clarified whether attackers use techniques for distributing fake av software that exploit the weaknesses of users who have already suffered from cyber threats. in this section, we introduce our system for collecting and detecting frad sites on the internet automatically. the system consists of two steps: web crawling and classification. the implementation of a web crawler that collects and stores browser-level information from web pages is the first step in our system. the requirement of the "how to", "guide", "solution", "tips", "report", "instruction" removal "remove", "get rid of", "uninstall", "delete", "fix", "clean", "kill", "block", "repair", "anti", "entfernen", "eliminar", "verwijderen", "deinstallieren", "desinstalar", "supprimer", "remuovere", "usunac" problem "virus", "malware", "spyware", "trojan", "backdoor", "adware", "threat", "infection", "ransom", "error", "pop up", "redirect" device "computer", "pc", "windows", "mac", "browser" crawler is to extract linguistic and image features from a web page rendered by a web browser and to compose a feature vector for the result. to analyze the frad sites in detail, we also need to capture the network traffic to and perform browser interactions on the web page. to achieve this, we designed and implemented the crawler using scrapy , which is a web crawling framework for python, in order to develop functions for monitoring and managing logged data. we used selenium as the middleware for scrapy to automate a real web browser. we used google chrome as the default web browser for the crawler. to monitor network traffic in detail, we used chrome devtools api . this is necessary, because we collect network-level information such as http requests and responses that selenium api does not handle directly. the collected information-such as screenshots, html source codes, and network traffic-are stored to mongodb. we use those kinds of information for the next step, classification. in the second step, our system extracts features from the information collected from the web pages and identifies frad sites using a supervised machine learning approach. in particular, the system analyzes term frequencies in web pages and urls, the presence of logo images on screenshots, and html structures, such as the number of tags, and combines them into a feature vector. we explain the detail of each feature below. ] example. the text content of the web page is written with a summary of the cyber threat and specific removal information for it. our key insight is that the frad sites must include a phrase composed of the following four categories of terms: way, removal, problem, and device. table shows a list of example terms. as the feature vector, we use the number of occurrences of each term category in the following four fields: the title, url path, domain name, and text content. the terms in the four categories are intended to capture phrases such as "how to remove trojan.zerocleare virus from my pc." because the frad sites are created in many languages, we leverage machine translation services such as cloud translation api and amazon translate . we translate the title and text content of the crawled web pages into english and then calculate the frequencies of the terms. to create the list of terms, we extracted all terms that match each category from the title, url paths, domain names, and text content of frad sites that were randomly selected from our created dataset, as discussed below in sect. . some domain names include non-english terms in the removal category, such as "entfernen" in german and "eliminar" in spanish. because these domain names are difficult to translate, we manually obtained such terms as much as possible. to this end, we separated the domain names by "." or "-" and used word segmentation and then searched for the meaning of each extracted word. logo images. we next consider features that specify logo images on the frad sites. the frad sites include download buttons and software packages that may be shared among multiple frad sites. the frad sites also display logos of security vendors, operating system (os) vendors or software certification companies in order to pretend to be legitimate sites. these logos are copied from vendors' sites or used as image files modified from the original images. to find such visual characteristics, our system uses an image matching approach on the basis of our logo image database. specifically, the system extracts images from img tags and crops images for which the area matches a or button tag elements from screenshots. it calculates the perceptual hash of these images and compares them to the image database. if the target image is more than % similar to the image in the database, the system determines it to be a logo image. three types of images are stored in the database: logos of security vendors or software certification company ( images), package images of fake av software ( images), and images of the download buttons ( images). we extracted images belonging to the three types from the frad sites used in the above. our system counts the number of images that match each type to create feature vectors. html structure. here, we explain the features extracted from the html structure that we use for identifying frad sites. as with previous works that identify specific types of malicious web pages [ , ] , the numbers of a and iframe tags are important indicators of frad sites. also, frad sites often re-use web page templates so that they have similar structures of html source codes. in other words, the frequency of html tags and combinations of those numbers characterize frad sites. to find such features, the system counts the number of appearances of html tags. the html tags to be counted are the top tags frequently used in the frad sites mentioned above. we explain the method used to collect frad sites in the wild in order to make the dataset employed to evaluate our classification model. we first collected the names of cyber threats. then, we searched for and gathered candidates of frad sites using the names of those cyber threats. finally, we manually created a labeled dataset for our evaluation experiment. we collected the names of cyber threats to make search queries to find candidate frad sites. as described in sect. , frad sites prepare many entries that introduce ways of removing specific cyber threats such as malware detection names and malicious domain names. to collect such names efficiently, we crawled the database pages of security vendors (e.g., symantec security center ) and a security community site (e.g., malwaretips[.]com) in october . we collected names of threats, including malware detection names, malicious domain names, and popup messages. we created search queries using the collected names of cyber threats and gathered the urls of web pages using a search engine. to collect frad sites efficiently, we added "how to remove" to the name of the cyber threat to create the search query, instead of searching only for the name of the threat. we found that we can collect more frad sites by searching with "how to remove" in our experiment described in sect. . . to collect search results systematically, we used microsoft bing web search api and gathered k urls. we chose one url for each domain name from among the gathered urls. as a result, we extracted , urls with , unique domain names to crawl. we crawled , web pages using our system and created a labeled dataset. since there is no existing url blacklist that accurately identifies frad sites, we manually labeled them by analyzing the crawled web pages and actually accessed them as necessary. to efficiently conduct this process, we created a web application that displays screenshots and buttons to choose labels (frad and non-frad sites). this application extracts information about the crawled web pages from our mongodb database and generates the web pages for labeling. we implemented it using node.js and the express framework. we labeled web pages as frad sites if they satisfied following heuristic rules. if not, we labeled the web pages as non-frad sites. i. we check whether a web page introduces a removal guide for a specific cyber threat. if so, we check rule ii. ii. we check whether the web page has visual characteristics specific to frad sites, as described in sect. . . specifically, we check whether the web page has an image of a fake av software package or a logo of a security vendor or a software certification company. we also check screenshots of the removal instructions or download buttons, which are often shared with multiple frad sites. if the web page has these characteristics, we identify it as an frad site. if not, we further check rule iii. iii. we confirm that clicking a download button on the web page triggers a download of a fake av software installer or initiates a web transition to a distribution or payment site for fake av software. we performed this process by manually accessing the web page and clicking the download button. from the -h labeling process, we obtained web pages of frad sites with unique domain names. to create a dataset, we randomly selected web pages from these frad sites. we also randomly selected web pages from non-frad sites, which are the web pages remaining after excluding the web pages of frad sites. since we collected the non-frad sites using the same search queries as for the frad sites, they often introduce removal information for cyber threats, details of malware, or introductions to legitimate av software, just as frad sites do. thus, it is a challenging task to identify frad sites accurately from these similar web pages. we next evaluated the detection capability of our system in terms of its capability to classify web pages accurately as frad sites or non-frad sites. we also conducted an experiment to discover unknown frad sites in the wild using the trained classification model. we first evaluated the detection accuracy of our system using the balanced dataset including frad sites and non-frad sites. we used a random forest classifier as the machine learning algorithm for two-class classification, because we can easily tune it due to the small number of hyper parameters to be considered. we conducted a -fold cross validation to determine how accurately our system performed classifications. we found that our system classified web pages with a . % true positive (tp) rate (= t p t p +f n ), where fn = false negative, a . % false positive (fp) rate (= f p f p +t n ), and with . % precision (= t p t p +f p ). the system identified non-frad sites as frad sites (fps). examples include articles from security vendors that introduce malware information, ranking web pages for legitimate av software, and blog entries that describe correct removal instructions. five fps were security vendors' web pages that often appear in search results when searching for removal information for cyber threats. we can therefore reduce fps by placing the domain names of major security vendors on a whitelist. examples of false negatives include web pages with domain names that do not include words such as "remove" or "malware." other false negatives do not contain visual features such as images of fake av software packages or logos of security vendors. to collect unknown frad sites that have not been found in sect. . , we conducted additional data collection and detection using our classification model, which has high detection accuracy. additional data collection. we first describe additional data collection to find more frad sites in the wild, such as non-english frad sites and frad sites with content copied from other sites. in the process of creating the dataset described in sect. , we found many frad sites written in various languages. some of them were translated automatically according to the browser's language setting when the web pages were loaded. some web pages were also written in multiple languages to enable users to switch languages. in addition, we found frad sites dedicated to certain languages. in such cases, the domain names contain words in those languages (e.g., "entfernen" in entfernen-spyware[.]example and "eliminar" in eliminarvirus[.]example), as described in sect. . . we also found that frad sites are often copied from other frad sites and from legitimate sites that introduce specific malware removal information. these frad sites not only use the names of cyber threats extracted from legitimate sites but also copy page titles or entire articles from them. to find such frad sites, we collected page titles from legitimate sites (malwaretips[.]com and malwarefixes[.]com) and from the frad sites we labeled, which include non-english sites, and we searched for the titles using bing api. although it is difficult to create search queries in multiple languages to collect non-english frad sites, we can gather them efficiently in this way. we gathered k page titles from these web pages and collected , urls ( , domain names) from these search. we extracted up to three urls from each domain name and crawled them ( , urls) using our system. as a result of the classification of additionally crawled web pages, we identified , urls as frad sites. to find fps, we manually checked web pages classified as positive in the same way as described in sect. . . examples of fps include the following. some technical-support scam [ , ] sites were falsely identified as frad sites, because they offered support for malware removal and displayed noticeable phone numbers and web-chat support. these fps are not frad sites, however, because they did not lead users to fake av software but instead are actually malicious web pages themselves, which are listed in virustotal . moreover, our system falsely detected pirate web pages that introduce free downloads of fake av software. although such fake av software is useless and not very well-known, some web pages illegally offered such software. other fps include software review and download sites, which distribute fake av software as well as legitimate software. we also found fps similar to those described in sect. . . by excluding these fps, we finally determined , urls ( , domain names) as frad sites. the precision of this classification result was . %. although this precision is somewhat less than the results obtained in sect. . , we accurately identified frad sites. the reason for this decrease in detection capability is that we changed the search queries from "how to remove" and the name of threats (used in sect. . ) to page titles of known frad sites, so that the types of web pages in the search results were somewhat changed. overall, in this paper we have identified , domain names, including the newly discovered , domain names, to be frad sites. to confirm the frad sites already reported by security vendors, we searched for all , domain names in virustotal. of the total, . % ( domain names) of the domain names had urls that had already been detected by one or more vendors. we also found . % ( / , ) of the domain names had urls that are sources of detected files. although some frad sites have been detected by a small number of security vendors, most of the frad sites we found in this paper have been unreported to date. these frad sites are less likely to be filtered out from search results, even if they were reported as malicious. thus, most of these frad sites remain easily accessible to users and remain threatening to them. we measured the ecosystem and risk of frad sites using both passively collected statistical data of user accesses and actively crawled data. in the experiment described above, we found frad sites using our system and simply checked the detection status for each of them on virustotal. here, we analyze deeply the , domain names of frad sites that we found in sect. in terms of incoming traffic to those frad sites, the distribution of fake av software from those sites, and poisoned search results that are occupied by frad sites. to find out what browsing behaviors of users are at risk of reaching frad sites, we analyzed the incoming channels (i.e., in fig. in sect. ) of the frad sites that we found in sect. . to this end, we need data on the history of user accesses to and traffic volumes of those web pages. thus, we leveraged the statistical data provided by similarweb , which passively observes hundreds of millions of global devices and covers over countries and territories. using this approach, we collected statistical data from october to december in that we used in the measurement studies described below. overview of incoming traffic. we first show an overview of seven types of incoming traffic to frad sites. we investigated , domain names of frad sites for which data are available in similarweb (out of , domain names of the frad sites we discovered in this paper). note that statistical data of web pages with few user accesses are not provided. these frad sites have . million visits per month in total. figure shows the percentage of traffic to the frad sites from each incoming channel. the channels consist of seven labels: search (accessed from a search engine), direct (directly accessed by entering urls in a web browsers), referral (accessed from other web pages), social media (accessed from social media), paid search (accessed from keyword advertisements on search engines), display ad (accessed from advertisements on web pages), and mail (accessed from hyperlinks on email). note that the incoming traffic measured as mail comes only from web mail. incoming traffic from email client software or other applications is measured as direct. the mean values of search, direct, referral, and social media were . %, . %, . %, and . %, respectively. the value for each of the other three channels is less than . %. paid search, display ad, and mail have few data for further investigation. also, we only know the amount of incoming traffic that we have shown here from the data of direct. therefore, in the following, we analyzed the detail of three channels: search, referral, and social media. search. to find out how users reached frad sites via search engines, we investigated the statistics of the search queries. we extracted the top english search queries ( , unique queries in total) for each frad site and categorized them. table shows the categories and the number of search queries. we found that . % ( , / , ) of the search queries were related to the names of specific cyber threats. they included malware detection names (e.g., trojan:win /bearfoos.a!ml), malicious domain names, and alert dialog messages (e.g., "your computer is infected with dangerous viruses"). among them, . % ( / , ) are search queries combining "how to" with words meaning removal (e.g., "remove", "delete") and the names of cyber threats. we found that . % ( / , ) of the search queries combined words meaning removal with the names of cyber threats. users also searched for the names of cyber threats alone ( . %, / , ) of for software or os error messages (e.g., "msvcp .dll missing"). thus, many users reach frad sites by searching for cyber threats and corresponding removal guides. the names of fake av software were also used as search queries to reach frad sites ( / , ). we found that . % ( / , ) of the search queries were used to search for downloads of software such as office software or video games and guides of cracking them. forty percent ( , / , ) of the search queries were not included in these categories. we also analyzed incoming traffic from social media. we investigated frad sites for which statistical data for queries incoming from social media is available from similarweb. table shows the top social media that led users to frad sites and the number of frad sites to which users were redirected from each type of social media. users visited . % ( / ) of frad sites from youtube and . % ( / ) of those from facebook. attackers create social-media accounts for these frad sites and post videos or messages to lure users to frad sites. these accounts pretended to be official accounts that use the web-site names or domain names of frad sites. they introduce removal information for cyber threats in the same way as entries for frad sites, and they put hyperlinks leading to frad sites in the description of their videos and messages. we found that some accounts post such instruction videos on youtube several times a day. these videos got as many as k views. we also found that attackers created such accounts across multiple social media. in summary, attackers not only optimize search results to lead users directly to frad sites, but also they use various social media to increase user accesses to frad sites. referrals. in addition, we investigated referral traffic that leads users to frad sites. in other words, we analyzed the incoming traffic to frad sites when users accessed them from other web pages, excluding search engines and social media. we found that users visited web pages belonging to various categories before reaching frad sites. in most cases, attackers abuse these sites, where anyone can post messages, to impersonate good users who introduce removal information for cyber threats with urls of frad sites. the web pages categorized as games (e.g., steamcommunity[.]com) were used in the same manner. attackers also posted frad sites' urls in comment sections in articles in news and media and other categories. in short, attackers leverage popular web pages where they can post comments and hyperlinks to lure users to visit frad sites. to identify threats that occur when users access frad sites, we performed an additional crawling experiment. while we simply found frad sites using our system in sect. , and we investigated users' incoming traffic to them in sect. . , the malicious activity derived from them was not revealed by these experiments. therefore, we actively crawled the frad sites and collected installers of fake av software and their respective distribution sites. to this end, we added a function to the crawler of our system to enable it to detect a download button on an frad site and click it. then we analyzed the downloaded files and transferred the web pages from those frad sites. we first describe the details of the new function that enables our crawler to interact with the frad sites. the crawler crops images with areas that match the a tag and img tag elements of frad sites. if the crawler finds a "download" string in the images using optical character recognition, it clicks on that area. we used two types of useragent with different os (windows and macos v . ). this is because frad sites change the fake av software to be distributed according to the useragent's os, typically windows or mac. to collect the urls of frad sites to crawl, we searched for the , domain names of frad sites using bing api and selected up to three urls based on the search results for each domain name. the reason for this is that web pages of frad sites with the same domain names can lead to different destinations (e.g., different software distribution sites) depending upon their urls. to find more fake av software, we collected , urls and crawled them twice with two types of useragent. as a result, the crawler downloaded , files with unique md hash values and reached domain names ( urls) of web pages from frad sites. in the following, we investigated the downloads of fake av software originating from the frad sites (i.e., in fig. in sect. ), web pages transferred from those sites (i.e., in fig. ) , and redirectors that relayed these downloads and web page transitions. fake av software downloaded from frad sites. we analyzed the files that our crawler downloaded (see in fig. ) to identify the installers of fake av software. first, we checked files with unique md hash values on virus-total and found that of those files had been detected. to specify fake av software families from the detected files, we manually analyzed and searched them using their filenames and metadata (e.g., product name, legal copyright, and file description) read by exiftool . we examined whether the files were related to malware removal, registry fix, or speed up based on the above information and on the software distribution sites that we obtained from the search results. we classified files into unique fake av software families with different software names. all fake av software families have software distribution sites reachable from search engines. the software distribution sites profess to be official sites for these fake av software families. for example, these sites show download and purchase menus and provide customer support such as web chats or toll-free calls. the remaining five detected files were not fake av software but instead were malware that pretend to be installers of legitimate software, such as music-production software and video games. to find more fake av software from the undetected files, we compared their filenames and metadata with those of the classified fake av software families. as a result of determining files with the same strings as the fake av software, we additionally found files to be fake av software. overall, we found files ( dmg files and exe files) of the fake av software families. web pages transferred from frad sites. we also analyzed the web pages of domain names that our crawler reached after clicking on download buttons (see in fig. ). in the above measurements, we investigated fake av software directly downloaded from frad sites. however, frad sites also navigate users to software distribution sites that lure them to purchase and download fake av software. to find such web pages, we analyzed the crawled data (e.g., screenshots of web pages) and manually classified the malicious web pages. we first checked the domain names on virustotal and found that domain names were detected. we then specified the web pages that offered license purchases of known fake av software or were related to malware removal, registry fixes, and speedup from the web pages of the detected domain names. we found that domain names were related to distributions of fake av software, including six domain names of payment sites and domain names of software distribution sites. the payment sites required inputting credit card numbers and personal information to purchase fake av software. out of the domain names, we found that domain names were distribution sites for new fake av software families in addition to the measurements described above, where we found fake av software families. thus, we found fake av software families in total. the detected domain names also included five domain names of frad sites that we found in sect. . that is, users may be transferred from one frad site to another. we also found malicious web pages that distribute malicious chrome extensions. we found domain names associated with such threats and four domain names related to distributions of other types of malware. to reveal the network infrastructure related to the distribution of fake av software, we investigated the redirectors that relayed the above fake av software downloads and web page transitions. we analyzed the network traffic that our crawler captured and extracted redirectors for which the effective second-level domains (e ld; e.g., example.com is a e ld of www.example. com) are different from those of the source web pages (i.e., the frad sites) and destination web pages. we found domain names ( e ld names) as redirectors of , url redirections associated with fake av software downloads and web transitions to software distribution sites. nine of these domain names were known advertising domain names listed in easylist . in addition, we found a small number of redirectors that were involved in many fake av software distributions. for example, we found that . % of the url redirections were associated with just two domain names: safecart[.]com and revenuewire [.] net. these two redirectors navigated to and fake av software families, respectively. the domain name safecart [.] com not only is a redirector but also is a payment web page that prompts users for their credit card numbers. some redirectors, such as reimageplus [.] com and paretologic [.] com, which are software distribution sites, navigated to other software distribution sites. we conducted a further measurement experiment to analyze the percentage of frad sites in the search results. in sect. . , we used statistical data to investigate search queries that users used to reach frad sites. then, we determined the risk of users reaching these frad sites by actually searching with those search queries and analyzing the search results. when users search for specific names of cyber threats to find removal information, many frad sites prominently show up in search results. to confirm these poisoned search results, we investigated search queries, combining cyber threats and three search patterns. the three search patterns are those that users frequently use, as found in the measurements in sect. . : "how to remove" and the name of a cyber threat, "remove" and the name of a cyber threat, and only the name of a cyber threat. we extracted the latest names of cyber threats from public lists: malware detection names from symantec security center and malicious domain names from malwaretips [.] com. also, we randomly chose malicious browser extensions out of browser extensions that we found in sect. . . we investigated the top search results for each search query, which are the top result pages from popular search engines such as google and bing. we collected , web pages from the top search results for each of the search queries in total. by matching the , domain names of the frad sites collected in sect. . , we found that , web pages ( . %) were frad sites. table shows the percentages of frad sites included in the search results for each search query and the names of the cyber threats. when we searched for the names of cyber threats with "how to remove" or "remove," the percentages of frad sites were . % and . %, respectively. the frad sites were also included at a high rate in the results of searching only for the names of cyber threats. in particular, . % of search results for the domain names were frad sites. search results for malicious browser extensions did not include many frad sites ( . %), but there was less useful information available for users to use to remove the threats or determine whether they are malicious. we also found youtube web pages as search results, with videos and descriptions that introduced frad sites. we found that . % ( / ) of the search queries returned search results for which the top web pages were all frad sites. in summary, we found that most of the search results were occupied by frad sites when users searched for removal information for cyber threats, making it difficult for users to reach correct information. ethical considerations. we followed research ethics principles and best practices to conduct this study [ ] . we analyzed users' behavior to visit frad sites using anonymized statistical data on user accesses for this study. we purchased a license to access data that is legally collected based on similarweb's privacy policy. the information extracted from the web pages we crawled is publicly available data. to reduce server load, our experiment that interacted with download buttons was performed only once for each web page that we identified as an frad site. limitation. although our system can accurately identify frad sites, there are some limitations. since our system is specialized for collecting and detecting frad sites, which are the important platforms used by attackers to distribute fake av software, detecting software distribution sites is out of scope for this paper. we identified software distribution sites that pretended to be official sites for legitimate av software on the basis of detection results from virustotal and manual analysis. we showed that we can visit various software distribution sites from frad sites by clicking on the frad sites. we also found that these software distribution sites share common network infrastructures, such as ad networks and redirectors. thus, further analyses focusing on the web pages arriving from the frad sites collected by our system should support efficient collections of software distribution sites. we then discussed a technique that can be used to evade our classification of frad sites. developers of frad sites employ phrases related to the removal information for threats in domain names, urls, titles, and text contents. this is because they use the topic of the web pages to attract or persuade users. they also place logos of trusted companies to disguise frad sites as legitimate sites. a possible evasion technique would be to remove these characteristics that psychologically affect users. however, this also would reduce the interest of users and the usefulness of the frad sites to the attackers. in addition, excluding phrases related to malware removal lowers the seo rankings of frad sites and user accesses. since our system relies on these characteristics to identify frad sites, we can accurately detect high-risk frad sites that strongly affect the users' psychology. since our collection of frad sites depends on search engine results, we have not collected all frad sites on the internet. to efficiently collect frad sites, we used the names of the cyber threats that are mainly used by attackers to lure users and leverage search engines, which are the most common channel to lead a user to frad sites. as a result, our analysis found that frad sites are created in many languages and have a large amount of user access. our system is useful for continuously collecting frad sites to create url blacklists and for analyzing trends for this type of attack. we have reviewed related work that investigated the distribution infrastructure for fake av software and the social engineering techniques attackers use to trick users. using a combination of unsupervised, graph-based clustering, cova et al. analyzed the network infrastructure (e.g., domain registration information and ip addresses) of fake av software distributions to reveal their ecosystem and attack campaigns [ ] . although they investigated the relationship of servers hosting fake av software, they did not discuss how users access these web pages. rajab et al. conducted a measurement study that discovered web pages related to the distribution of fake av software from data collected by google [ ] . they showed the prevalence of fake av software in malware distributions on the web. stone-gross et al. proposed an economic model and estimated attackers' revenue by analyzing back-end servers that attackers used to support fake av software businesses [ ] . they identified the incoming channels that users employ to reach distribution sites, such as landing pages that exploit browsers to redirect users. they also described the social engineering techniques used to install fake av software using web pages that display fake infection alerts. although these studies analyzed the infrastructure and traditional distribution techniques for fake av software-such as drive-by downloads and fake infection alerts-new distribution tactics using frad sites have not been revealed. there is also related work that describes case studies of fake av software distribution from social engineering aspects [ , [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] , [ ] [ ] [ ] . in most studies, they analyzed fake infection alerts via advertisements that threaten or attract users to install fake av software. however, no previous study has focused on the frad sites or analyzed attackers' techniques that exploit the psychological weakness of users who are suffering security problems. we have proposed a system to crawl the web and automatically identify frad sites that introduce fake removal information for cyber threats and lure users to fake av software. using the proposed system, the first comprehensive measurement study was conducted to disclose the ecosystem of distributing fake av software via frad sites. we have analyzed both passively collected statistical data on user accesses and actively crawled data to clarify users' risky behavior that leads them to reach frad sites and which exposes them to attacks navigated from frad sites. our findings emphasize that it is very difficult for users who are suffering from cyber threats to reach correct removal information, because search results related to the specific cyber threats are poisoned by frad sites. our system is useful for search engine providers and security vendors for excluding and blocking frad sites. measuring pay-per-install: the commoditization of malware distribution an analysis of rogue av campaigns the menlo report: ethical principles guiding information and communication technology research manufacturing compromise: the emergence of exploit-as-a-service evilseed: a guided approach to finding malicious web pages deseo: combating search-result poisoning surveylance: automatically detecting online survey scams self destructing exploit executions via input perturbation knowing your enemy: understanding and detecting malicious web advertising surf: detecting and measuring search poisoning marketwatch inc: global antivirus software market report and future opportunity assessment detecting malicious http redirections using trees of user browsing activity dial one for scam: a large-scale analysis of technical support scams towards measuring and mitigating social engineering software download attacks stranger danger: exploring the ecosystem of ad-based url shortening services it's free for a reason: exploring the ecosystem of free live streaming services the nocebo effect on the web: an analysis of fake anti-virus distribution predicting impending exposure to malicious content from user behavior tiresias: predicting security events through deep learning exposing search and advertisement abuse tactics and infrastructure of technical support scammers hong kong based malvertiser brokers traffic to fake antivirus scams the underground economy of fake antivirus software investigating commercial pay-per-install and the distribution of unwanted software enabling reconstruction of attacks on users via efficient browsing snapshots what you see is not what you get: discovering and tracking social engineering attack campaigns parking sensors: analyzing and detecting parked domains poisonamplier: a guided approach of discovering compromised websites through reversing search poisoning attacks all your clicks belong to me: investigating click interception on the web key: cord- -lc xm authors: hodkinson, i. d.; jensen, t. s.; maclean, s. f. title: the distribution, abundance and host plant relationships of salix‐ feeding psyllids (homoptera: psylloidea) in arctic alaska date: - - journal: ecol entomol doi: . /j. - . .tb .x sha: doc_id: cord_uid: lc xm abstract. . five species of psyllid occurred on seven species of salix at meade river, alaska. studies were made on the two common species psylla pclmeni löw and p.phlebophyllae hodkinson. the former feeds on the phanerophy tes salix pulchra, s.lanata, s.alaxensis and s.glauca, the latter on the chamaephytes s.phlebophylla and s.reticulata. . both p.palmeni and p.phlebophyllae had a ‐year life cycle and nymphal development took place on the female salix catkin. the life cycle was generally closely synchronized with the period of catkin development. however, only a few eggs were laid on s.glauca . seasonal perturbation of the host plant by flooding, ice movement and blown sand prevented psyllids breeding in certain areas colonized by the host plant. . in p.palmenidensities and ‘feeding pressure’, measured as biomass of psyllids per gram of catkin, on the different host plants followed the sequence s.pulchra>s.lanata> s.alaxensis > s.glauca. in p.phlebophyllae densities and feeding intensities were similar ons.phlebophyllaands.reticulataand grazing intensity was comparable withp.palmenion s.pulchra. . a highly significant negative correlation was found between psyllid density and catkin dry weight in s.pulchra, s.phlebophylla and s.reticulata, suggesting that psyllid feeding is affecting catkin growth. . predation of psyllid nymphs by syrphid larvae was heavy but there was no evidence of parasitism. . the life history strategies of the five psyllid species are discussed within the context of the constraints imposed by the arctic environment. this work forms part of the integrated programme 'research on arctic tundra environments' (rate) designed to quantify the intensity and impact of herbivore grazing on tundra vegetation. the 'rate' study area is investigate the distribution, abundance and phenology of the jumping plant lice (psylloidea) associated with dwarf willows (salix species) and t o measure their impact on their host plant. the psylloidea is a group of small, host specific, phloem feeding insects which often occur at very high densities on dicotyledenous plants, particularly in arctic or alpine situations (hodkinson, a, unpublished; koponen, koponen, , koponen & ojala, ) . the alaskan willows are particularly diverse; hultdn ( ) records fifty-six species for the region. similarly the associated psy llids exhibit parallel diversification; twentyone species are recorded by hodkinson ( ) . at meade river five psyllid species are associated with nine salix species: their host plant relationships are summarized in table . both psylla palmeni low and psylla phlebophyllae hodkinson were common and widespread, psylla zaecevi sulc and trioza otkasookensis hodkinson were widespread but uncommon, whereas psyiia macleani hodkinson was found a t only one restricted locality. hence attention was focused mainly on the t w o common species. psylla palmeni and p.zaecevi are widely distributed throughout the far northern region of the palaearctic and the latter species also occurs as a relic population in the mountains of northern india (klimaszewski, ; mathur, ) . psylla phlebophyllae and t.atkasookensis are also recorded from the arctic regions o€ eastern siberia (hodkinson, unpublished) whereas p.tnacleani occurs only in alaska (hodkinson, ) . the meade river area is topographically diverse with polygonized tundra, lake basin, dry ridge, bluff, sand dune and river edge habitats occurring in close proximity. the distribution of salix reflects this habitat diversity and study sites were chosen accordingly. salix pulchra is found mainly in the wetter more stable areas, in drained lake basins and around low centred polygons. salix phlebophylla and s.reticulata occur together on the drier, more exposed ridges but the latter species also extends into the wetter lake basins. salix lanata, s.alaxensis and s.glauca occur most commonly in disturbed situations, mainly along river margins. both s.phlebophyila and s.reticulata are chamaephytes, reaching a maximum height of about cm, whereas the remaining species are phanerophytes reaching a maximum height of about m, except s,alaxensis which reaches about m in sheltered micro-habitats on the dune site. mine site. previous disturbance associated with mining activities had produced a diverse mixture of salix spp. which made this site an ideal experimental area for the main comparative studies. the dominant vegetation was s.pulchra, s.lanata, s.alaxensis. s.glauca and s.reticuiata with less s.niphoclada. the x m sampling area was marked out as a m grid. dry ridge, this was the main dry site containing salix phlebophylla and s.reticulata growing in a mixed chamaephyte shrub community. the x m m sampling area along the ridge crest was marked out as a mz grid. lake basin. this was the main wet site, subject to waterlogging when spring snow melted. salix pulchra was dominant but s.reticu ata wasalsoabundant. the x loom sampling area was marked out as a mz grid. ice scrape. this disturbed site along the edge of meade river, was subject to ice scrape as the river ice melted. the willows were partially flattened and there was little or no ground vegetation. salix alaxensis grew in a narrow strip along the edge of the river with a wider belt of s.lanafa behind it. the s.knafa extended into an undisturbed area with a welldeveloped ground flora. a sampling grid was not marked out. sand dunes. on this area salix a h e n s i s grew in bare sand on a well-developed dune system adjacent to meade river. atkasook village. a single colony of psylla rnacleani was discovered on this site at the end of the season on s.alaxensis growing in a mixed salix community adjacent to meade river. no further studies were made on this site. a preliminary survey of the distribution of psyllid eggs on the willows at the various sites was carried out from june to july . a minimum of nine catkin-bearing stems was removed from the plant and the number of eggs on the catkins ( d or ). on the woody stem and on the green buds or leaves (depending on phenological state) was recorded. at the same time, a semiquantitative assessment of adult psyllid abundance on each salix was made by taking ten sweep net samples (twenty sweeps per sample). adult sampling was continued throughout the season to obtain material for investigation of ovary maturity. all data on female maturity are based on a minimum of fifty dissected females per host plant for each sample date. subsequently, each salix species on the mine, dry ridge and lake basin sites was sampled for psyllids a t regular intervals during the summer of . for each salix species a minimum of twenty-four random samples was taken. each sample was stratified to include three components: a female catkin, the current year's growth on the catkin bearing stem, and the current year's growth on a noncatkin bearing stem. the number of psyllid eggs and nymphs on each was then counted separately. catkins of s.pulchra, s.zanata, s.alaxensis and s.g&uca at the mine site were also sampled for p.palmeni nymphs on two occasions during july . preliminary sweep netting and egg sampling on july suggested that psyllids were absent from the ice scrape site. to investigate the distribution of psyllids there a transect was made on august from the riparian s.ahensis through to the unaffected s. lanata. at m intervals along three separate transect lines a sweep net sample for adults (twenty sweeps) was taken together with a four catkin sample which was examined for nymphs. it also became apparent that p.phlebophyllae nymphs were absent from s.reticuhta growing on the wet lake basin although they were abundant on the same salix species m away on the dry ridge. thus, to examine €he distribution of p.phlebophyllae along a wetdry gradient, eight catkins were taken at m intervals ( m up the slope) along a transect rising from the lake basin to the dry ridge on august . similarly, preliminary egg sampling of s.aiaxensis at the sand dune site on june suggested that psyllids were likewise absent from this site. thus twenty-five random composite samples, as previously described, were taken on july . during the period when eggs were present all salix samples were hand sorted. in later samples nymphs were extracted from catkins in modified berlese-tullgnn funnels. straight heat extraction, as used successfully for other psyllid species (hodkinson, a; hams, ) , was found to be inappropriate as direct heat caused older catkins to burst, trapping the nymphs. this problem was overcome by suspending the catkin over water in a polystyrene cup using a pin driven into the cup base: the water then served as the collection fluid. extraction efficiency, estimated by counting the remaining dead psyllids on twenty catkins after extraction, was at least %. the phenological state of each salix at each of the four main study sites was observed at regular intervals throughout the and seasons and recorded as dates of f i t occurrence of each stage of development. seasonal development of the willows was rapid within the % month growing season (fig. ) . on the basis of phenology the species were divided into two groups; group with catkins appearing before leaves (s.pulchra, s.lanata and s.alaxensis) and group with catkins appearing after the leaf buds have broken (s.glauca, s.phlebophyl a and s.reticulata). group species were the first to com-mence growth. on the mine site catkins of group species appeared at least days before group species whereas at the lake b a s h the difference was reduced to days. on the dry ridge site group species were rare but catkin appearance in group species was at least days later than group species at the mine site. catkin appearance in group species on the ice scrape site was delayed by - days compared to the mine site, although development to maturity was more rapid. the period between'catkin appearance and catkin burst represents the period favourable for psyllid growth (see later). in group species at the mine site this was - days and in group species on the dry ridge - days. salk glauca (group ) is atypical in that catkin burst is delayed and good estimates of time of ripening are difficult to obtain. at meade river the temperature threshold for salix growth was about oc. in the growing season began days earlier than in and t h i s is reflected in the advanced phenology of the willows in that year (fig. ) . however, the actual development time of catkins t o maturity remained almost constant between years. the life history of p.palmeni on s.pulchra. s.lanata, s.alaxensis and s.glauca at the mine site and on s.pulchra at the lake basin during is shown in fig. . at snow melt in mid-june, prior to the commencement of salix growth, sexually mature p.palmeni adults were present on the plant. oviposition on catkins occurred as soon as they appeared (see later); the eggs hatched and the nymphs passed through five instan before the first new generation adults appeared in late july. these adults remained sexually immature at least until late august. it was not possible to investigate the mode of overwintering as the site was inaccessible during the winter. however, adults probably overwinter in the litter layer or in rolled dead leaves attached to the plant. the closely related psylla moscouita andr., which has a similar life history on salix repens, hibernates in this manner (hodkinson, unpublished) . the time between first appearance of eggs and fiirst appearance of new generation adults varied between and days. this period, making due allowance for observational error, is very close to the estimated life of the female catkin ( - days in phenological group willows). therefore, if the psyllid is to complete its life cycle within a female catkin (see later) there must be almost perfect phenological synchronization between the two. this is achieved by p.palmeni being ready to oviposit the moment the catkin appears. this strategy results in the vast majority of t g g s being laid o n the phenological group l willows pulchra. lanata and alaxensis. on the mine site s a l k grcluca and s.reticulata were and days later respectively in producing catkins by which time the majority of p.palmeni females were spent. there is some evidence, however, that a small number of eggs were laid on sglauca and that these developed into adults late in the season. salix glauca may thus act as a 'safety net' species on which p.palmeni can reproduce, albeit at a low level, in years when synchronization with group species breaks down. s.phlebophylla at the dry ridge had a very similar life cycle to p.palmeni, except that eggs were not laid until the catkins emerged, days later than spulchra on the mine site and the first adults emerged days later than p.palmeni (fig. ) . again there was close synchrony between the life cycle and the life of a female catkin: the time taken to pass from first egg to first adult was days compared with a catkin life of and days for s.phlobophyila and s.reticuiata respectively. the scattered data collected on these species suggest that all overwinter as adults and pass through a single generation each year. trioza atkasookensis nymphs were always found in pit galls on leaves and never in catkins, whereas p.zaecevi nymphs were always found in catkins, often in association with p.paimeni. while n o data are available for nymphal development of p.macleani at meade river, late instars nymphs were collected from female xalaxensis catkins at fairbanks (hodkinson, ) . table summarizes the presence or absence of p.palmeni and p.phiebophyilae on all sample sites. absence is generally defined as there being no egg, nymph or adult found in any sample on a particular site throughout the study period. however, a few eggs and nymphs of p.palmeni were present on s.reti-c u~q~q at the mine site early in the season but were absent from a later sample. the data to + - s.lanata + - s.glauca + - s.retimlata -? - s.alaxcnsis + - lake basin s.reticulata - - s. pulchm + - s.rcticula fa + s.phlebophylla - + s.lanata - - s.alaxensls - - dry ridge - ice scrape dunes -support these overall conclusions are presented in the following sections. table illustrates the distribution of p.palmeni and p.phlebophyllae eggs on the various willows at all sample sites. these data demonstrate that psyllid eggs are highly aggregated on male and female catkins and that very few are laid on the rest of the plant. this is not surprising on willows in which the catkin appears before the leaves and in which initially only the catkin is available to the gravid female for oviposition, but it also applies to those willows whose catkins appear after the leaves. female psyllids must therefore actively seek catkins for oviposition. however, eggs were laid abundantly on both male and female catkins even though the former withered and dropped from the plant before the eggs hatched. this represents a major source of mortality in the populations of both p.palmeni and p.phlebophyllae. it was only on the female catkins that the psyllids were able to continue their development. except for the odd end of the season sample, when numbers were small, the densities remained consistently very much greater on catkins than on the rest of the plant (see table ). moreover, most of the nymphs found on the catkin shoot were associated with leaf axils right at the base of the catkin stalk. to test whether instar composition this index was used to compare instar composition on the catkin against those on the catkin shoot and non catkin shoot using a paired comparison 't' test. no significant differences between plant parts were demonstrated (table ) . a a - a a a a a shook to test the null hypothesis that there is n o linear relationship between numbers of nymphs on catkins and numbers on catkin fig. shows the distribution of p.palrneni along the ice scrape transect. no nymphs or adults were found on either s.alaxensis or s.lanata within the area affected by ice scrape but both became common once the unaffected s.lanata was reached. this strongly suggests that p.palmeni is prevented from breeding on the ice scrape. this may be because psylla palmeni overwinters as an adult and becomes active in spring as snow melt occurs. overwintering adults will require protected hibernation sites in the ground vegetationllitter layer and these sites are absent from the ice scrape. furthermore, ice movement over the host plant and surface flooding, at a time when adults are becoming active, will further minimize their chances of survival. a similar lack of suitable hibernation sites may explain the apparent absence of p.palmeni from s.alaxensis at the sand dune site during where any leaf litter falling to the ground in autumn is soon blown away. the success of p.palmeni on s.alaxensis at the mine site is probably because the salix community is subject to little or no natural disturbance. the plants are old and large, probably the residual colonizers following the cessation of mining activity. in the more natural unstable habitats of s.alaxensis, p.palmeni has few chances of reproducing. fig. were absent from the lake basin but appeared as soon as the transition on to the dry ridge was reached. flooding of the lake basin during spring snow melt ensures that the dwarf s.reticulata is completely inundated at the time when p.phlebophyllae adults would be becoming active and they appear unable to survive and reproduce in these conditions. in contrast p.palmeni on the lake basin appears able to move to its taller host plant s.pulchra and thereby largely avoid the effects of flooding. thus, on a broad scale, a combination of naturally occurring physical events appears to be acting to prevent both p.palmeni and p.phlebophyllae from breeding in areas which at first sight, on the basis of host plant availability, might be considered suitable. the failure of psyllids to recolonize these sites once the disturbances have passed suggests that the rate of dispersal out of the more favourable areas is too low to overcome the host plant synchronization demands placed on the psyllid. the contrast between a late ( ) and an early ( ) growing season permits a comparison between the progression of psyllid development and host plant phenology. in the early season s.a&xensis was the first to commence growth (fig. ) and psyllid development on this species, as indicated by the index of advancement, was advanced over s.pulchra, s. anata and s.g&uca (fig. ) . however, in the late season s.alaxensis commenced growth after s.&nata and xpulchra (fig. ) and in consequence the development of p.palmeni nymphs on these species remained advanced relative to s.alaxensis (fig. ) . in both years psyllid development was latest on s.glauca but the difference was most accentuated in the late season. catkin and shoot size varies considerably between different willows. for instance, the biomass ratios of catkins of s.lanuta, ahxensis, pulchra, glauca, reticulata and phlebophyllu were : : : : : respectively. thus psyllid numbers per sample unit do not give a meaningful comparison of numbers per standard unit of habitat. furthermore. it proved logistically impossible to sample all however, a few general conclusions can be drawn from a straight comparison of numbers (table ). on the mine site during numbers of p.palmeni nymphs per catkin were greater on s.pulchra than on s.lanat or s.alaxensis. but these three species as a group supported much larger populations than s.glauca or s.reticulara. the data for the latter were distorted by a large batch of eggs from a single catkin in the july sample. the catkin data (table ) show an identical sequence of abundance on the different willows. numbers of p.palmeni on s.pulchra were of a comparable order of magnitude on the mine site and the lake basin despite a slightly larger catkin size a t the latter. on the dry ridge numbers of p.phlebophyllae per catkin were approximately equal on both s.phlebophylla and s.reticulata. to provide a direct comparison of the potential feeding intensity by psyllids the biomass of psyllids per unit biomass of plants was calculated using psyllid biomass data (table ). for p.palrneni (fig. ) feeding intensity was consistently greatest on s.pulchra both on the mine and lake basin sites, being approximately and times greater than on s.lanafa and s.alaxensis. respectively on comparable dates. the figures for s.reticulata and s.glauca were consistently low. for p.phlebophyllae ( fig. ) feeding intensity was similar on both for both p.palmeni and p.phlebophyllae psyllid biomass t o catkin biomass ratios were often less than : . the biomass figure at any one time represents a minimum figure for tissue production. in other psyllids the efficiency of conversion of phloem sap into body tissues is about % (hodkinson, a) . thus t o have reached the observed biomass the psyllid population on the catkin must have consumed a weight of phloem sap at least one-tenth the weight of the catkin itself. correlation analysis was used t o test the null hypothesis that psyllid density, measured as total number of psyllids per unit weight of catkin, was not related to the weight of the catkin. this analysis was carried out for each that there is an inverse relationship between psyllid density and weight of catkin, implying that psyllids are suppressing catkin growth, i.e. the larger the psyllid population the smaller the catkin. significant relationships were found for those species already shown to be subjected to the highest grazing intensity. a parallel series of analyses was used to examine the relationship between psyllid density and weight of catkin bearing shoot. significant negative comlations were found for p.palmeni on s.lnata at the mine site on july ( r = - . , p < . ) and pphlebophyllae on s.phlebophylla at the dry ridge on july ( r = - . , p < . ). these results, while not as clear cut as for the catkin, imply that the psyllids may be acting to suppress growth of the catkin bearing stem, at least in these species. the harsh arctic climate poses severe adaptational problems for insects (downes, ) . maclean ( a) lists low winter temperature, short growing season with associated low temperatures, low primary productivity and the weakness of photoperiod timing clues during the arctic summer as the main environmental variables to which invertebrates must adapt. he suggests that adaptation is achieved by a combination of life cycle adaptations, microhabitat selection, metabolic and timing adaptations together with resistance to winter cold. this provides a useful framework in which to discuss adaptation in p.palmeni and the basic resource of any sap feeding phytophagous insect is its host plant. in general growth and reproduction can only occur while the host plant itself is growing, although there is some evidence for aphids that feeding on plants can occur prior to bud burst (hill, ) . thus if the plant is growing for only a short period, a high degree of phenological synchrony between insect and host plant is essential if the resource is to be fully exploited. the life cycles of both p.palrneni and p.phlebophyllae a t meade river show a high degree of synchrony with host plant phenology and this, coupled with a rapid development rate, enables them t o complete a generation within the summer season. such a life history is essentially the same as in the equivalent temperate univoltine salixfeeding species psylla rnoscovita and pmliceti forster where growth and reproduction occurs during the short period of the year when conditions are favourable (hodkinson & white, ) . this shows a marked contrast with the heather psyllid strophingia ericae (curtis), in which the evergreen host plant is 'available' throughout the year, and which has adapted t o more rigorous climatic regimes by extending its life cycle from i t o years (hodkinson, a, b; parkinson & whittaker, ) . several adaptational features permit such a rapid life cycle. firstly, both p.palrneni and p.phlrbophyllae lie t o the left of the r-k selection spectrum as defined by southwood ( ) and possess attributes, such as high reproductive/mortality rates, necessary for the rapid exploitation of a temporary resource, albeit a somewhat restricted food resource. secondly, the psyllids select a highly favourable microenvironment on the plant, namely the female catkin. johnson (pers. comm.) showed that levels of soluble nitrogen, the food source of psyllids, are greater in female catkins than in any other plant component growing on the alaskan tundra. furthermore, the temperature regime within a developing catkin is more favourable than either air temperature or the surface temperature of adjacent leaves. at meade river the catkin itself has an effective 'life' equal t o or only a few days longer than the time required by the psyllids t o complete their development. thus. successful completion 'of the life cycle demands a mechanism by which eggs are concentrated on the female catkin as soon as it appears. this can be achieved by the sexually mature females being present on the plant at time of catkin burst and seeking out catkins for oviposition. ovary maturation in female psyllids prior to the salix flushing suggests that they are able to obtain some nutrition from stems. female psyllids d o not discriminate between male and female catkins and a significant proportion of eggs laid on the former are lost to the population when the catkin drops from t h e plant. lack of photoperiod clues a t the time when psyllids emerge from hibernation suggests that synchronization with the host plant is temperature controlled. natural selection should act t o ensure that the temperature threshold for psyllid egg maturation and oviposition is close t o the temperature threshold for salix growth. similarly if the psyllid population is t o survive and reproduce it must complete its life cycle within the 'life' of a female catkin. the relative development rates of psyllid and catkin, taken in the context of the day degrees available for a given site, may well set the limits of psyllid distribution. winter snow cover on the meade river site is thin and the willows are exposed t o extreme winter temperatures. to ensure survival the psyllids must seek favourable overwintering sites, most probably in the litter layer or perhaps in rolled dead leaves remaining attached t o the plant, and then return t o the plant in spring. it is only the active adult stage which is capable of such movement, a t least on t o the taller willows. it is generally postulated (e.g. pianka, ) that arctic organisms have evolved broad, nonspecialized niche-habitat requirements and that this results in the low diversityhigh density situation observed among such organisms. if one considers the salix community a t meade river t o represent the total food resource of the psyllid community, then there is some evidence of resource partitioning, i.e. niche segregation, among the psyllid species. however, some flexibility is still retained. psylla phlebophyllae occurs only on s.phlebophylla and s.reticulata whereas psylla palmeni occurs only on s.pulchra. s.lanata, s.alaxensis, s.glauca and s.niphoclada. while there is some temporal and spatial separation of oviposition by females of each species there is sufficient overlap for oviposition on alterna-ing, often of the order of g of psyllid per loog of catkin emphasizes the fact that catkins are nutrient sinks which continually accumulate photosynthate produced else where in the plant and that p.pulmeni and p.phlebophyl&e are feeding at the site where the supply of high quality nutrients is maximal. predation by syrphid larvae caused considerable mortality in both p.pulmeni and p.phlebophyilue populations whereas no evidence of parasitism was found. this may again be a result of synchronization demands; predators such as syrphids are opportunistic generalists, feeding on any available psyllid instar as well as alternative food resources and having little need to synchronize their life cycle with that of their prey. in contrast, psyllid parasites are usually both psyllid and instar specific (hodkinson, ) and hostparasite synchronization is highly vulnerable when subjected to a fluctuating environment. tive willows to occur. that this does not happen suggests that both psyllids are able to discriminate between willows and are thus specific to their respective host plant series. further resource subdivision occurs within a single host plant. on salix knata, p.pulmeni nymphs feed on catkins whereas triozu atkasookensis nymphs feed in pit galls on leaves. however, some overlap of food niches does occur, with p.zuecevi nymphs coexisting with p.pulmeni nymphs on the catkins on s.lunutu and s.pulchru and p.mucleani occurring with p.pulmeni on s.a&xensis. albeit at different sites. with the exception of p.mucleani, a scarce species, all species exhibit a degree of polyphagy. this flexibility in choice of food plant may act as a mechanism for spreading the risk and coping with year-by-year climatic variation. for instance the later developings.gluucu could act as a 'safety net' species on which p.pulmeni could complete its development in years when synchrony with other hosts broke down. the psyllid population data suggest that certain willows support higher populations than others and could therefore be considered favoured hosts. psylla palmeni population densities were consistently higher on s.pulchru than on other suiix spp. in contrast, s.lanutu and s.ulaxensis appear to be the preferred hosts of p.zuecevi and p.mucieuni respectively (hodkinson, ; klimaszewski, ; loginova, ) . thus potential niche segregating mechanisms which could act under conditions of intense competition appear to exist. variations in tundra topography lead to spatial separation of the different sulix/psyllid associations. physical perturbation of the environment by ice movement, seasonal flooding or blown sand appears to prevent psyllids breeding in certain areas which are naturally colonized by the host plant. the heterogeneous patterns of psyllid distribution are probably further accentuated by phenological variations within il single sulix species, related t o the aspect of the site in which it is growing. for s.pulchra. s.phlebophyila and s.reticulatu. those species subjected to the highest feeding pressure, psyllid populations appear to be suppressing catkin growth. the high psyllid biomass which catkins are capable of support- adaptations of insects in the arctic sampling pear foliage for nymphs of the pear psylla using the lkrlese-tullgren funnel (psylla pyricola: hem. hom. ryllidae) jr hodkinson. . . ( ) the biology of the psyuoidea (hornoptera): a review the psyliids (hornoptera: ryuoidea) of alaska psylloidea. handbook for phe idenifflcotion of brit&h insects. ( a) herbivorous invertebrates of the mountain birch at kevo, finnish lapland notes on herbivorous insects of the birch in southern greenland on the mesofauna of the field layer of three subarctic habitats the jumping plant lice or psyllids (hornoptera: psylloidea) of the palaearctic review on palaearctic psyllids of the genus psyjro geoffr. (homoptera: rylloidea). associated with willows ecological adaptations of tundra invertebrates ecology of tundra invertebrates at rudhoe bay, alaska. ecological investigutions of rhe bndm biome in rhe ptudhoe bay regfon. alaska (cd seasonal patterns of abundance of tundra arthropods near barrow ryllidae of the indian subcontinent. indian council of agricultural research a study of two physiological races of the heather psyllid srrophlngia ericoe (curtis) (hornoptera :rylloidea we thank the royal society and the national science foundation (grant opp- - to university of alaska) for financial support and m n e. hodkinson for writing computer data analysis programs. hill, c.p. key: cord- - oq n af authors: liu, zhi-ping; wu, ling-yun; wang, yong; zhang, xiang-sun; chen, luonan title: bridging protein local structures and protein functions date: - - journal: amino acids doi: . /s - - - sha: doc_id: cord_uid: oq n af one of the major goals of molecular and evolutionary biology is to understand the functions of proteins by extracting functional information from protein sequences, structures and interactions. in this review, we summarize the repertoire of methods currently being applied and report recent progress in the field of in silico annotation of protein function based on the accumulation of vast amounts of sequence and structure data. in particular, we emphasize the newly developed structure-based methods, which are able to identify locally structural motifs and reveal their relationship with protein functions. these methods include computational tools to identify the structural motifs and reveal the strong relationship between these pre-computed local structures and protein functions. we also discuss remaining problems and possible directions for this exciting and challenging area. dna sequences can be called 'the blueprint of life', while proteins represent the fulfillment of this blueprint in terms of structures and functions. a fundamental goal of functional genomics research is to understand how proteins carry out functions in a living cell (eisenberg et al. ; brenner ; goldsmith-fischman and honig ) . in addition to experimental methods, computational methods have been extensively applied with the aim of developing hypotheses in terms of assigning specific functions to specific proteins and providing valuable biological insights. the basic rationale behind such research is that the gene sequence determines the amino acid sequence, and the amino acid sequence determines the protein structure, which, in turn, determines the protein function (whisstock and lesk ) . many proteins, even among those in the protein data bank (pdb), have not yet been annotated, although we have succeeded in deriving their structures watson et al. ) . we review here the in silico annotation methods currently used to determine protein function from protein local structures. generally speaking, proteins are the main catalysts, structure components, signal transfers and molecular machines in a biological organism. as such, they are the basic elements of functions. however, the definition of function means different things to different people since it is an evolving concept associated to an abundance of interpretations. in general, these functions can be described at many levels, ranging from the biochemical functions at the molecular level (e.g. catalytic or binding activities) to biological processes at the level of biomolecular cooperation (e.g. signal transduction or cellular physiological process) to the cellular components at the cell level of an organ (e.g. nucleus or rough endoplasmic) (devos and valencia ; watson et al. ) . several schemes/tools/ databases have been developed in recent decades for measuring protein functions in a systematic model with the aim of annotating the functions of proteins ; these include ec (barrett ) , mips (ruepp et al. ) , go (the gene ontology consortium ; camon et al. ) and kegg (kanehisa and goto ) , as shown in table . using the existing function annotations as 'gold standard' data, researchers have been able to develop many protein function annotation methods in recent years based on protein relationships. we summarize the existing function annotation methods in the framework of fig. , which shows the basic tendency for the functional inference methodology-i.e. to explore sequence similarity, structure similarity, protein interaction and their integration. we briefly review these in the following list: • using sequence information. the methods in this category often utilize a blast, fasta or psi-blast score to detect the sequence similarity and annotate the functions to a target protein from its homologous protein (whisstock and lesk ; watson et al. ) . in the safe zone (rost ) of sequence similarity, the sequence-based methods can provide putative annotations with high confidence (wilson et al. ) . a number of papers have tested the global performance between the relationship of the sequence similarity and function similarity. shah and hunter ( ) tested the sequence similarity among enzymes in many ec classes at various thresholds and concluded that the functional similarity could not be detected perfectly when the sequences are not similar enough. wilson et al. ( ) and devos and valencia ( ) obtained similar results. joshi and xu ( ) presented a systematic analysis on the sequencefunction relationships in four model organisms. • using structure information. protein structures are more conserved than protein sequences (orengo et al. ; hou et al. ) . a number of methods have been developed with the aim of assessing protein structure similarity (kolodny et al. ) ; these can be grouped as coordinate-based [such as structal (gerstein and levitt ) , samo , tm-align (zhang and skolnick ) and prosup (lackner et al. ) ], distance-matrix-based [such as dali (holm and sander ) , ce (shindyalov and bourne ) , fatcat (ye and godzik ) , ssap (orengo and taylor ) ] and secondary-structure-based [such as vast (gibrat et al. ) , ssm (krissinel and henrick ) , lock (singh and brutlag ) and fast (zhu and weng ) ]. classifying the proteins into different classes or families based on global structure similarity will assist researchers in determining the relationships among different proteins and provide a foundation of functional organization (brenner ) . scop (murzin et al. ) , cath (orengo et al. ) and fssp (holm and sander ) comprehensively cluster all proteins with known structures. based on the safe zone means that pairwise sequence identity is higher than %, the twilight zone, about - %, the midnight zone below % those clusters, the functional relationships among the proteins can be roughly detected. • using interactome information. proteins always interact with other molecules to carry out their functions (sharan et al. ). information on protein-protein interactions or other interaction maps among molecules, such as dna binding with protein, can be explored to annotate the protein functions from complexes and pathways of the biochemical processes. the network-based methods extend the functional inference from the single molecular level to a systematic level by considering interactions among genetic components and transferring functions among them (vazquez et al. ; barabasi and oltvai ; zhang et al. ). sharan et al. ( ) cataloged the methods to direct methods and module-assisted methods individually. • using integrated information. another sensible strategy is to use many different data sources to increase the chances of obtaining function annotations for any given protein. for example, in marcotte et al. ( ) , proteins are grouped by experimental data, such as metabolic function, phylogenetic profiles, rosetta stone results and correlated messenger rna expression patterns to determine the functional relationships among proteins of the yeast. in fact, many methods are in this framework (sanishvili et al. ; george et al. ; pal and eisenberg ; zhao et al. a, b) , especially when data integration becomes the focus of the systems biology study. in this review we highlight the relationships between protein local structures and protein functions since it is commonly believed that local regions on the structures are responsible for the performance of the particular functional tasks (russell ; ferre et al. ) . well-known examples include the ser-his-asp triad in enzymes and other known special structural frameworks that carry out certain functions of catalysis (torrance et al. ) . it is now widely recognized that some fold similarities suggest an 'analogous' rather than a 'homologous' relationship (russell ) . proteins can adopt similar tertiary folds while performing different functions at different binding site locations. given the existing status that the midnight zone functional linkages escape from the sequence and global structure similarity, only the local structures can be used to analyze detailed relationships with functions by determining the protein-protein interaction, protein-dna interaction or other global performance from the physical perspective. also, the local structures of protein provide more detail information on protein function not only from the single targeted action of that protein, but also from the integrative process due to the detailed components and the three-dimensional architecture. the local structures are also important in the design of drugs and bioengineering. in an interesting paper, schnell and chou ( ) convincingly provided nuclear magnetic resonance (nmr) data showing that the m proton channel of influenza a virus is typically controlled by the local conformational change with a phgated mechanism. the discovery provides sound evidence that the local structures are crucial for determining protein function, and it is vitally important in the search for effective anti-influenza drugs (borman ) . bridging protein local structures and protein functions can timely provide useful information for structure-based drug design [e.g. see the methods in chou et al. ( ) and wang et al. ( a) against severe acute respiratory syndrome (sars), and that in du et al. ( ) against chicken influenza a virus h n , as well as a review paper (chou ) ]. thus, it a key task of researchers in this field is to investigate the relationships between protein functions and protein local structures. this review is organized into four parts. first, we will describe the main molecular functions related to protein local structures. this is followed by a description of existing definitions and methods for detecting similarities in local structures. in the third part, the detailed methodologies to bridge local structures with functions are reviewed. some discussion and future directions are summarized in the last part. to bridge the relationship between local structures and functions, we first catalog the molecular functions of proteins strongly related to local structures. the local structures are often regarded as the protein-protein interfaces, catalytic sites, ligand-binding sites, metal-binding sites, post-translational modification sites or other miscellaneous active sites. table lists some of the important functional categories (chakrabarti and lanczycki ) . a protein generally interacts with other proteins in performing and regulating many processes in a cell. the pace of discovery of protein-protein interactions has recently accelerated due to rapid advances in new technologies (salwinski and eisenberg ; chou and cai ) . the basis of protein-protein interactions often lie in local planar patches on the protein surface. the factors that influence the formation of protein-protein complexes can be cataloged into four different types-i.e. homodimeric protein, heterodimeric proteins, enzyme-inhibitor complexes and antibody-protein complexes (jones and thornton ) . from the structural perspective, structural characterization of macromolecular assemblies usually poses a more difficult challenge than structure determination of individual proteins (russell et al. ) . effective approaches for the prediction of protein-protein interactions at physical interaction levels are also strongly in demand (wodak and mendez ) . zhou and qin ( ) reviewed the methods currently being applied for interface prediction. the characteristics between interface and noninterface portions of a protein surface, such as sequence conservation, proportions of amino acids, secondary structure, solvent accessibility and side-chain conformational entropy, are often used to distinguish the specificity of local structures relating to protein binding function. in the transcription and translation process, proteins always bind to dna and rna to fulfill various functions. proteinnucleotide binding is a fundamental function of proteins. luscombe et al. ( ) classified the dna-binding proteins into eight different structural/functional groups. the helix-turn-helix (hth) motif is one of the most common structures used by proteins to bind dna, while protein-rna binding involves a number of different structure specificities. a comparison between protein-rna and protein-dna complexes revealed that while base and backbone contacts (both hydrogen bonding and van der waals) are observed with equal frequency in protein-rna complexes, backbone contacts are more dominant in protein-dna complexes (jones et al. ) . the positively charged residue, arginine, and the single aromatic residues, phenylalanine and tyrosine, all play key roles in the sites for the rna-binding function. ligand binding is a key aspect of protein functions. proteins recognize their natural ligands for transportation, signal transduction or catalysis (campbell et al. ) . the cleft volumes in proteins have strong relationships with their molecular interactions and functions. the ligands are always bound in the largest clefts (laskowski et al. ) . metal ions have a role in a variety of important functions, including protein folding, assembly, stability, conformational change and catalysis (barondeau and getzoff ) . in order to leverage the wealth of native metalloprotein structures into a deep understanding of metal ion site specificity and activity, high-resolution analyses of metal site structures and metalloprotein design are increasingly being performed. one of the most ubiquitous zinc-binding motifs is the c h zinc finger motif, which was first identified in transcription factors (ebert and altman ) . another broad concept for protein local structures is the active site. active sites of a protein are comprehensively related to functionally important local regions of the protein. the special features of functional local structure are to provide deep insights into the relationship between structure and function. for example, the catalytic triads provide a target of structure for finding the catalytic function of the proteins. to date, many different types of local structures have been defined or identified based on the geometry of the local regions, protein surface patterns, chemical groups or the electronic features. local structure features are believed to be the factors related to concrete functions. at the sequence level, the local regions may be scattered on the primary sequence, forming special motifs. alternatively, at the folding level, they form locally spatial shapes. we can simply catalog the types of methods used to identify the local structures as follows: methods to detect profiles of sequences with special local shapes, and methods to detect the substructures with special features based on folding. the primary sequence of a protein consists of (combinations of) different amino acids, which fold and pack together to constitute a special three-dimensional structure. sequence motifs are conserved segments in protein primary sequences. multiple sequence alignment is often used to identify the common patterns in several protein sequences, especially in the homology family. more advanced sequence comparison algorithms can detect the profiles of the functional residues in the primary sequence. of these algorithms, one of the most common methods is the hidden markov model (hmm). there are a number of important sequence pattern databases, which are publicly available from the internet (table ) . local three-dimensional structural patterns, such as the surface cavities of protein (e.g. the clefts and pockets) also have conserved structural features. table lists a number of methods currently used to identify local structure patterns. the procedure of recognition can be generally divided into two parts. the first is to construct the local structures. the geometric structure patterns and biochemical properties can be used to segment the protein architecture into small substructures. the second is to search the annotated sites from the literature and databases. the analysis of the protein surface is an active area of research in terms of the study of local structures. to date, two aspects of protein surface patches have attracted the most attention. the first is based on the defined features, such as surface curvature, surface cavities, electrostatic potential and hydrophobicity. castp (binkowski et al. b ) uses the weighted delaunay triangulation and the alpha complex for shape measurements. the local regions are defined by computational geometry, which identifies and measures surface accessible pockets as well as interior inaccessible cavities for proteins and other molecules. computational geometry also measures analytically the area and volume of each pocket and cavity, both in solvent accessible surface (sa, richards' surface) and molecular surface (ms, connolly's surface). castp provides an online resource for locating, delineating and measuring concave surface regions on the three-dimensional structures of proteins. these include pockets located on protein surfaces and voids buried in the interior of proteins. pvsoar (binkowski et al. ) provides an online resource to identify similar protein surface regions. kinoshita and nakamura ( ) provided a molecular surface database of proteins' functional sites, named the ef-site. the method displays the electrostatic potentials and hydrophobic properties of proteins together on the connolly surfaces of the active sites for analysis of the molecular recognition mechanisms. the connolly surfaces are made by using the molecular surface package program, and the electrostatic potentials are calculated by solving poisson-boltzmann equations with the self-consistent boundary method. the second aspect of protein surface patches is based on a predefined segmentation size of the surface. the method uses a segmentation procedure to divide the surface into small segmentations that correspond to certain physical modules of the surface. surfnet (laskowski ) generates molecular surfaces and gaps between surfaces from three-dimensional coordinates supplied in a pdbformat file. the gap regions can correspond to the voids between two or more molecules or to the internal cavities and surface grooves within a single molecule. the program visualizes molecular surfaces, cavities and intermolecular interactions by segmenting the surfaces. based on the surfnet algorithm, surface (ferre et al. ) identifies clefts and explores the cleft boundaries called the surface patch. a non-redundant set of protein chains is then used to build a database of protein surface patches. lig-site (hendlich et al. ) is a program for the automatic and time-efficient detection of pockets on the surface of proteins that act as binding sites for small molecule ligands. pockets are identified with a series of simple operations on a cubic grid. the special features of catalytic sites or other types of functional sites are also detected as local structures. some functional annotations of residues can be found in databases and the literature, and the location of these residues can be represented as potential structural motifs. although it is difficult to define just precisely what is the active site in protein structures, there are a number of methods for identifying active sites or functionally important residues. wallace et al. ( ) described a geometric hashing algorithm, called tess, to derive three-dimensional coordinate templates for motifs. tess has been used to create a database of enzyme active site templates called procat (wallace et al. ) . procat provides facilities for interrogating a database of three-dimensional enzyme active site templates. it has been superseded by the catalytic site atlas (csa). the csa (porter et al. ; torrance et al. ) is a database documenting enzyme active sites and catalytic residues in enzymes with a threedimensional structure. it contains the original annotated entries derived from the primary literature by hand and the homologous entries found by the psi-blast alignment. a hetatm and all annotated sites in the pdb also provide patterns of protein local structures strongly related to protein functions. stark and russell ( a) reported patterns in non-homologous tertiary structures (pints) that can be used to uncover the recurring three-dimensional side-chain patterns based on the algorithm in stark et al. ( c) . sitebase (gold and jackson a) is a database of known ligand-binding sites within the pdb. the search for an annotated position in the pdb constructs the location information of the ligand-binding sites. a collection of known sites from mining the annotations in the pdb has been designated as the pdbsite (ivanisenko et al. ) , which collects amino acid content structure features calculated by spatial protein structures, and physicochemical properties of sites and their spatial surroundings. the pdbsitescan (ivanisenko et al. ) provides an automatic search of three-dimensional protein fragments similar in structure to known functional sites. a comparison of local structures in the pdb also provides valuable information for constructing the structural motifs. kleywegt ( ) presented two programs, spatial arrangement of side-chains and main-chains (spasm) and rigor, for recognizing spatial motifs in protein structure. spasm can be used to find matches in the structural database for any user-defined motif. the program also has a unique capability to carry out ''fuzzy pattern matching'' with relax requirements on the types of some or all of the matching residues. rigor, on the other hand, can compare a database of pre-defined motifs against a perhaps newly determined structure. rigor scans a single protein structure for the occurrence of the pre-defined motifs from a database. zemla ( ) presented a method for finding three-dimensional similarities in protein structure. this algorithm is able to generate different local superpositions between pairs of structures and to detect similar fragments. it allows the clustering of similar fragments and the use of such clusters to identify sequence patterns that represent local structure motifs. sumo (jambon et al. ) can detect the common site, which corresponds to the catalytic triad. the general procedure of bridging the local structures with functions lies in constructing a candidate pool of local structures, identifying important features of function-related local structures and validating their functional importance. the existing methods can be grouped into two categories, i.e. unsupervised and supervised methods, as shown in fig. . the unsupervised methods directly mine those local structures with special features and then detect their functional implications. the supervised methods use known function-related structures as the templates and match these similar patterns by comparison. there are strong relationships between the two kinds of methods. most of the proposed methods are based on physical and/or biochemical patterns of the protein, and some particular patterns of local structures are strongly related to functions. in the unsupervised methods, the patterns are derived directly from a group of local structures without known functions. their functional importance and characteristics are identified by analyzing the conserved factors in the common features of local structures. the identified function-related local structures can then be used to enlarge the pool of functional templates, which in turn can be used to measure the potential functional importance of the new substructures. figure a shows these relations. these functionally important local regions can be referred to as functional motifs. the functional motif is the particular local structure pattern with factors that are the determinations of performing particular functions. note that the functional motif is very important for studying the relationship between structure and function in theory, and it is of practical importance to the protein design of drug targets and other bioengineering fields. we can investigate the functional patterns of the local structures in multiple ways. more specifically, we group existing methods to bridge protein local structure and function into three categories based on the hierarchical perspective, as shown in fig. b . . element-based methods. these identify the local structures from sequence, structure and/or other important amino acid residues information. the methods detect the common or conservation patterns in these elements of proteins and bridge the gaps between the local structures and functions at the micro level. during the bridging process, if prior knowledge is used to identify the functional importance or guide the detection, the method belongs to the supervised category, otherwise it belongs to the unsupervised division. . feature-based methods. these investigate the putative features between the local structures and functions. this category can be further divided into two subcategories-i.e. scoring methods and learning methods. the identified functional features of local structures provide templates of functional motifs. in the scoring methods, the features of local structures are scored by a defined function, and then the scores are used to decide whether the targets are functionally important. thresholds are often then chosen to provide guidance for detecting the importance of target local structures. in the learning methods, some features are chosen and learned from the known function-related local structures. the learned features in the trained machines can be used as the classifier to decide whether the testing targets are strongly related to the function. these methods belong to the supervised division. . network-based methods. these are based on graph theory and network topology. the methods can be divided into two subcategories. the first is at the individual level and the second is at the mapping level. at the individual level, the protein can be represented as an interactive graph of the residues, with linkages representing the close distance among them. cliques of the graph, hub residues and residues with other special topology measures may correspond to functionally important regions and residues. at the mapping level, a network represents the similarity relations among the local structures. the functional motifs are mined from informative subgraphs. this approach lies in between the other two methods mentioned above and can be regarded as being semi-supervised because it uses some heuristic knowledge. element-based methods are based on a basic intuition that the conserved part of a sequence and structure is an important functional motif (aloy et al. ; jones and thornton ) . the first step is a discovery process, which mines similar local structures from the sequences or structures of the target proteins. when similar local patterns of structures in some proteins are identified, the identified structure features of local regions will be the determinants of similar functions among the proteins. the second step is to match the process by comparing the target to the known functional templates. based on the similarity between these, the function relationship is inferred. this method is also a basic tool for developing more advanced techniques to bridge the relationship between local structures and functions. the sequences, structures or other elements of the proteins are considered in the comparison. table lists the main methods that are currently being used. depending on whether or not some prior knowledge is used in the assessment, the method is classified as being supervised or unsupervised. similar patterns of local structures can be identified in different proteins, even in proteins of the midnight zone with neither sequence homology nor structure homology. in this case, the alignment of the sequences and/or structure ( ) tertiary side-chain patterns subgraph-isomorphism matching assam artymiuk et al. ( ) segments can imply similar functions of the local structures. these similar local structures of the proteins are important prognostic factors of their similar functions. multiple sequence alignment ma et al. ( ) used ten protein interface families selected from two-chain interface entries in pdb, identified surface residues and filtered out contact residues. the alignment results of the residue properties revealed that polar residue hot spots occur frequently at the interfaces of macromolecular complexes, thereby distinguishing binding sites from the remainder of the surface. using multiple structure alignment, these authors also showed the correspondence between energy hot spots and structurally conserved residues. three residues (trp, phe and met) were observed to be significantly conservative in binding sites. these identified local structures are linked with binding functions. all residues in a protein are not equally important. some are essential for certain structures or functions, whereas others can be readily replaced. conservation analysis is one of the most widely used techniques for predicting these functionally important residues in protein sequences. capra and singh ( ) proposed a method focusing on the analysis of a multiple sequence alignment of the homologous sequences in order to find columns that are preferentially conserved. the results show that conservation is highly predictive in identifying catalytic sites and residues near bound ligands, while it is much less effective in identifying residues in protein-protein interfaces. structure alignment: geometric hashing rosen et al. ( ) proposed a surface comparison algorithm in search of active sites and functional similarity. these authors first represents the surface by a face-center critical point technique and then derive active sites using geometric hashing to match the two surfaces. finally, a clustering process is used to obtain the functional active sites. this method addresses the question of the usefulness of geometric comparisons and concludes that pure geometric surface matching is capable of obtaining biological meaningful solutions. based on the geometric hashing algorithm, leibowitz et al. ( ) presented a multiple structural alignment algorithm to detect a recurring substructural motif. given an ensemble of protein structures, the algorithm automatically finds the largest common substructure (core) of c a atoms that appears in all of the molecules in the ensemble. the detection of the core and the structural alignment are carried out simultaneously. fischer et al. ( ) also presented an approach using geometric hashing to compare spatial, sequence-order independent atoms. it automatically detects a recurring three-dimensional motif in protein molecules without any predefinition of the motif. pairwise alignment of constructed local structures there are several methods that detect the functional relationship between local structures by structure alignment in an allagainst-all manner. pazos and sternberg ( ) presented an automatic method to extract functional sites (residues associated to functions). the method relates proteins with the same go functions through structural alignment in an all-against-all manner and extracts three-dimensional profiles of conserved residues. based on the identified local structures derived from geometry or physicochemical features, the functional relationship of these local regions can be detected and the comparison result is stored in a database. when querying a local structure, similar hits imply functional relationships. binkowski et al. ( a binkowski et al. ( , described such an approach for inferring functional relationships of proteins based on the pvsoar by detecting sequence and spatial patterns of the functional relationship of pockets on protein surfaces. the pvsoar database provides a pairwise comparison of the pockets in the pocket database castp. similar pockets in different match degrees are searched for in an advanced analysis of the function relationship among the local structural motifs. with respect to the pockets on the protein surface, schmitt et al. ( ) developed a similar method based on a clique detection algorithm by comparing the query against the whole database. kinoshita and nakamura ( ) also provided an analogous method for comparing molecular surface geometries and electrostatic potential on the surfaces based on ef-site. their method bridges the protein surface electronic features of the local region with the specific functions. jambon et al. ( ) designed a new but similar approach for finding similarities using pairwise matching to detect common three-dimensional sites in proteins. the basis for their method is a representation of the protein structure by a set of stereochemical groups. protein surface regions with similar physicochemical properties and shapes may perform similar functions and bind similar partners. shulman-peleg et al. ( ) constructed two web servers and software packages for use in recognizing the similarity of binding sites and interface-siteengine and interface-to-interface (i i)-siteengine. the input into the two methods is two protein structures or two protein-protein complexes; the output is the surface of the proteins for a region similar to the binding sites or the interfaces. the methods are efficient for large-scale database searches of the entire pdb. obviously, the two locally identified structures are related to functions by searching similar local regions of their protein structures. pairwise alignment of annotated local structures information on functional sites obtained from databases or the literature can be used to construct the function-related local structure database, while the pairwise alignment method is used to detect the functional relationships. stark and russell ( a) developed pints to uncover the recurring three-dimensional side-chain patterns based on the algorithm in stark et al. ( c) . their method queries the structural motif database constructed from the annotation mining from pdb to find similar three-dimensional motifs by a recursive, depth-first search algorithm, i.e. to find all possible groups of identical amino acids common to two protein structures independent of sequence order (russell ). the search is conducted with distance constraints by ignoring those amino acids unlikely to be involved in the protein function. stark et al. ( b) identified some functional sites and compared these with procat and rigor. moreover, pints provides a measure of statistical significance based on a rigorous model for the behavior of rmsd (stark et al. c) . sitebase (gold and jackson a ) is a database of known ligand-binding sites within the pdb. gold and jackson ( a) provided a method that automatically identifies ligand-binding sites by searching for hetatm keywords in pdb files and constructing a database by excluding protein/peptide ligands and treating het-groups as individual ligand-binding sites. protein atoms within a -Å radius of any ligand atom were defined as its binding site in this work, and the ligand-binding was identified by comparison in an all-against-all way with geometric hashing. similar functions of binding sites were detected regardless of the sequence and folding similarity (gold and jackson b) . pdbsitescan (ivanisenko et al. ) provides an automatic search of three-dimensional protein fragments that are similar in structure to known functional sites. a collection of known sites has been designated as the pdbsite (ivanisenko et al. ) , which is a database of amino acid content, structure features calculated by spatial protein structures and the physicochemical properties of sites and their spatial surroundings. protein-protein interaction sites are also generated by an analysis of contact residues in heterocomplexes. the algorithm is developed based on an exhaustive examination of all possible combinations of protein positions. the bid (fischer et al. ) database searches the primary scientific literature directly for detailed data on protein interfaces by text mining and stores the characterization of protein-protein binding interfaces at the amino acid level. the bid also organizes protein interaction information into tables, graphical contact maps and descriptive functional profiles. evolutionary tracing protein functional sites have a number of similar and unique features. in order to explore the information fully, one can incorporate both sequence and structure data in a functional site prediction method. the evolutionary trace (et) method is one such method that relies on both sequence and structure information. the most basic form of the algorithm requires a multiple sequence alignment of a protein family and an evolutionary tree, based on sequence identity, which can approximate the functional classification of the protein sequences (lichtarge and sowa ) . yao et al. ( ) proposed an automatic et method that ranks the evolutionary importance of amino acids in protein sequences. this was the first method to quantify the significance of the overlap observed between the best-ranked residues and functional sites. the information inherent in a phylogenetic tree is added to the analysis of conserved sequences, often revealing the more subtle aspects of protein function. starting with a multiple sequence alignment, a representative structure and a phylogenetic tree, this method evaluates conservation at each position in the alignment for different sequence similarity cut-offs. in its original implementation, residues were classified as variable, conserved or a group-specific set that is specific to one branch of the phylogenetic tree. this analysis can be further expanded by the use of amino acid substitution matrices to evaluate conservation. in either case, a representative structure is used to visualize the distribution of scores at the end of the analysis. based on the et method, landgraf et al. ( ) presented a three-dimensional cluster analysis that offers a method for predicting functional residue clusters. this method requires a representative structure and a multiple sequence alignment as input data. individual residues are represented in terms of regional alignments that reflect both their structural environment and their evolutionary variation, as defined by the alignment of homologous sequences. the overall and regional alignments are calculated from the global and regional similarity matrices, which contain scores for all pairwise sequence comparisons in the respective alignments. three-dimensional clustering analysis is an easily applied method for the prediction of functionally relevant spatial clusters of residues in proteins. armon et al. ( ) proposed the consurf method, which takes into account the evolutionary relationships among the sequence homologues by closely approximating the evolutionary process and by considering the phylogenetic relationships among the sequences and the similarity between amino acids. consurf maps evolutionary conserved regions on the surface of proteins with a known structure; it also aligns sequence homologues of the protein and uses the alignment to construct phylogenetic trees. the trees are then used to infer the presumed amino acid exchanges that occur throughout the evolution. each exchange is then weighted by the physicochemical distance between the exchanged amino acid residues. the results show that the patches of conserved residues correlate well with the known functional regions of the domains and are more sensitive than the et method. to obtain an indication of the validity of functional inheritance, aloy et al. ( ) proposed a method to evaluate the reliability by exploiting the conservative functional sites predicted by the et method. their method first used a fully automatic procedure to carry out the et method, and then was benchmarked in terms of required sequence divergence and the resultant selectivity and specificity of the prediction. finally, the results that were obtained using the prediction of location of functional sites to assist in filtering putative complexes were evaluated. the functional importance of local structures can be detected by empirical methods or by computational methods. the identified functional motif can then be used as the structure template to detect the functional regions in other protein structures. the chosen method often consists of a comparison process, and the structure and physicochemical features can be considered in the comparison to the templates. in addition, a measurement of the similarity to the template is used to assess the functional importance of the testing of local structures. wallace et al. ( ) described a three-dimensional template matching method based on geometric hashing for automatically deriving three-dimensional templates from the protein structures deposited in pdb. in their paper, these researchers described a template derived for the ser-his-asp catalytic triad. their results showed that the resultant template provides a highly selective tool for automatically differentiating between catalytic and noncatalytic ser-his-asp associations. goyal and mande ( ) described the generation of three-dimensional structural motifs for metal-binding sites from known metalloproteins. using three-residue templates and four-residue templates, the method scans all available protein structures in the pdb database for putative metalbinding sites. the search of the whole pdb database predicted many novel metal-binding sites, which are the identified functional motifs. chakrabarti and lanczycki ( ) recently performed a detailed survey of compositional and evolutionary constraints at the molecular and biological functional levels for a large set of known functionally important sites extracted from a wide range of protein families. they compared the degree of conservation across different functionally important sites. the compositional and evolutionary information at functionally important sites was compiled into a library of functional templates. in their paper, these researchers developed a module that predicts functionally important columns of an alignment based on the detection of a significant 'template match score' to a library template. benchmark studies showed good sensitivity/ specificity for the prediction of functional sites and high accuracy in attributing correct molecular function type to the predicted sites. the comparison between potential sites and the templates is very important in these kinds of methods. artymiuk et al. ( ) developed a program called assam, which represents a motif-by-distance matrix between pseudo-atoms and uses the subgraph-isomorphism algorithms to find matches. this is an elegant method for the detection of common tertiary side-chain patterns based on the use of the ullman subgraph isomorphism algorithm. singh and saha ( ) formulated the problem of identifying a given structural motif (pattern) in a target protein and discussed the notion of complete and partial matches. they described the precise error criterion that has to minimized and also discussed different metrics for evaluating the quality of partial matches. they also presented a novel polynomial time algorithm for solving the problem of matching a given motif in a target protein. the functions of a protein are strongly related to the physicochemical features of that protein. the physical features (such as geometry, size, depth and shape) and the chemical features (such as energy, hydrophobicity, amino acid propensity and conservation) of the local structure are often measured by a score function or learned by a machine learning algorithm. the functional importance and specificity of a protein can be identified from the evaluation score or the trained standards of features. the main methods are listed in table . the scoring method can often calculate an explicit value for the features, while the learning method can reveal the patterns inexplicitly. the properties of local structures are believed to be conserved in terms of determining their functions. the identified local regions of structure are analyzed based on the variations in their properties, which are investigated using the identified functionally important sets of local structures. the method to predict the functions of the local structures is often based on a scoring scheme that is used to analyze the properties of the targets. in particular, the scores of the features are used as the measurements to determine whether the local structure has functional importance, for example, for a particular function. scoring by physical features first, the physical features of the local structures, such as size, depth and shape, are ( ) considered for scoring the function-related features. the shape features alone may provide basic information for the analysis of the functional features related to the protein function. siggers et al. ( ) introduced a new method to structurally align interfaces observed in protein-dna complexes. their method is based on a procedure that describes the interfacial geometry in terms of the spatial relationships between individual amino acid-nucleotide pairs. they subsequently provided a yet newer method to study the determinants of binding specificity. kawabata and go ( ) proposed a new definition for pockets using two explicit adjustable parameters, the radii of small and large probe spheres, which correspond to the two physical properties, 'size' and 'depth'. a pocket region was defined as a space into which a small probe can enter, but a large probe cannot. based on the geometric standards of large probe spheres, this method identified the binding site positions. from the geometrical viewpoint, the methods described above need further improvement to describe or compare the global shape and the local structures. morris et al. ( ) presented a novel technique for capturing the global shape of a protein's binding pocket or ligand. this method uses the coefficients of a real spherical harmonics expansion to describe the shape of a protein's binding pocket. shape similarity is computed as the l distance in coefficient space. kahraman et al. ( ) used a recently developed shape matching method to compare the shapes of protein-binding pockets to the shapes of their ligands. their results indicate that pockets binding the same ligand show greater variation in their shapes than those which can be accounted for by the conformational variability of the ligand. this result suggests that geometrical complementarity in general is not sufficient to derive molecular recognition. scoring by chemical features chemical features of local structures are very important for determining their functional specificity. these feature scores of local structures can be used as standards to determine their functions. the structural locations of functional sites are conserved between homologous proteins because functionally important residues tend to cluster together in space, forming threedimensional residue clusters or surface patches. panchenko et al. ( ) presented a method to assign each residue a score that depends on its own conservation in homologs and the conservation of residues in its spatial neighborhood. the high-scoring sites are more likely to be involved in specific binding or catalysis. functionally important residues in a protein are known to be those computed to have energy among experimentally destabilized residues. elcock ( ) proposed a method to predict functionally important residues based solely on the computed energetics of a protein structure. the energetic properties of binding surfaces in proteinprotein interfaces and protein-ligand sites were shown to be different (burgoyne and jackson ) . the pockets from qsitefinder (laurie and jackson ) were ranked by the scores of these properties-i.e. hydrophobicity, desolvation, electrostatics and conservation-which are used to determine binding sites. jones et al. ( ) developed a method to detect dnabinding sites on a protein surface. the surface patches and the dna-binding sites were initially analyzed for accessibility, electrostatic potential, residue propensity, hydrophobicity and residue conservation. in general, dnabinding sites are among the top % of patches with the largest positive electrostatic scores. this knowledge was used to make predictions. jones et al. ( ) presented a similar computational analysis of protein-rna interactions. there are a number of differences between dnabinding sites and rna-binding sites. for the rna-binding sites, van der waals contacts play a more important role than hydrogen bond contacts. as to the protein-dna binding local structures, luscombe et al. ( ) investigated hydrogen bonds as well as van der waals contacts and water-mediated bonds to assess whether there are universal rules that govern amino acid-base recognition. in a subsequent study, luscombe and thornton ( ) also identified the amino acid conservation and the effects of mutations on binding specificity. in liang et al. ( ) , an empirical score function consisting of a linear combination of the energy score, interface propensity and residue conservation score is used to predict interface residues. the top-ranked patches are predicted to be the potential interface sites. the accuracy of prediction has been improved significantly, relative to any single or pairwise combination, by combining the three terms. cheng et al. ( ) presented a method to predict protein function site using sequence alignment information as well as rosetta protein design and rosetta free energy calculations. logistic regression with the generalized linear model has been used to the determine weights of the sequence conservation, natural/designed sequence profile difference and natural/optimal residue free energy gap, all of which optimize the separation between functional and non-functional residues. innis et al. ( ) presented conserved functional group (cfg) analysis to predict function sites in proteins. the method relies on a simplified representation of the chemical groups found in amino acid side-chains to identify functional sites from a single protein structure and a number of its sequence homologs. scoring by physicochemical features those features based only on physical geometry or chemical energy often can not represent functional features comprehensively. most of the methods are used to integrate several important features together and then score these features for bridging the gaps between local structures and functions. the ligsite algorithm is based only on the geometry. huang and schroeder ( ) presented an extension and implementation method, ligsite csc , which is based on the notion of surface-solvent-surface events and the degree of conservation of the involved surface residues. the use of the connolly surface has led to slight improvements, whereas the prediction re-ranking significantly improved the binding site predictions. glaser et al. ( ) improved previous approaches by combining two known measures of 'functionality' in proteins, i.e. cleft volume and residue conservation, to develop a method for identifying the location of ligand-binding pockets in proteins. neuvirth et al. ( ) proposed a structure-based algorithm to identify the location of protein-protein interaction sites. the sites are defined based on connolly's molecular dot surfaces. the method defines an interface score that combines the chemical and geometry features of the interaction sites. interfacial residues are considered to be those with the % highest scores. geometry and energy properties have also been used to analyze the pocket functions for docking (li et al. ). hoskins et al. ( ) considered the use of solvent accessibility, residue propensity and hydrophobicity in conjunction with secondary structure data as prediction parameters to predict proteinprotein interaction sites. the influence of residue type and secondary structure on solvent accessibility is analyzed, and a measure of relative exposedness is defined. the highscoring residues are clustered as a basis for predicting interaction sites. tsuchiya et al. ( ) provided a method for analyzing protein-dna complexes, focusing on the shape of the molecular surface of the protein and dna, along with the electrostatic potential on the surface, and calculated a new evaluation score. based on the score, the method was used to classify dna-binding from non-dna-binding proteins. taroni et al. ( ) provided an analysis of the characteristic properties of sugar-binding sites. for each site, six parameters were evaluated-i.e. solvation potential, residue propensity, hydrophobicity, planarity, protrusion and relative accessible surface area (asa). three of the parameters were found to distinguish the observed sugarbinding sites from the other surface patches. these parameters were then used to calculate the probability of a surface patch being a carbohydrate-binding site. the total score of the properties was used to determine whether the surface patch was a carbohydrate-binding site. the features of the local structures play crucial roles in predicting protein function. to identify the relationship between protein local structure and protein function, the structural and/or physicochemical features can be learned implicitly using machine learning methods, such as the support vector machine (svm) and neural network. the support vector machine uses a linear model to implement nonlinear class boundaries through the input of a number of nonlinear mapping vectors into a high-dimensional feature space. it is based on mathematics theory and has many successful applications in statistical learning fields (vapnik ). these methods have been confirmed to be able to learn the features of local structures with functional importance. the features can first be investigated in the learning process and used to detect whether these features relate some specific functions. koike and takagi ( ) proposed an svm method to identify protein-protein interaction sites. the profiles of sequentially/spatially neighboring residues, plus additional information, constitute a feature vector, and the interaction site ratios are calculated by svm regression. the predictive performance is evaluated and compared in different quantitative features. cai et al. ( ) proposed an svm algorithm to predict the catalytic triad of the serine hydrolase family. bordner and abagyan ( ) proposed a similar svm to predict protein-protein interfaces. the local surface properties with a combination of an evolutionary conservation signal were used to train the machine on a large nonredundant data set of protein-protein interfaces. an svm learning protocol was provided by bhardwaj et al. ( ) for the prediction of dna-binding proteins. the characteristics, including surface and overall composition, charge and positive potential patches on the protein surface, were derived, and the svm was trained as a classifier to detect the dna-binding proteins. the high accuracy value has been achieved in a large set of testing proteins regardless of their sequence or structure homology. chung et al. ( ) recently exploited the svm approach to detect whether identified potential proteinbinding sites interact with each other. the information related to sequence and structural complementary across protein interfaces were extracted from the pdb. this work also built a pipeline to predict the location of binding sites. neural network the neural network is a learning method which adapts the relationships of neurons; as such, it is a simplified model of the neural processing of the human brain (zhang ) . based on the analysis of the both structures and sequences, gutteridge et al. ( ) used a neural network to identify catalytic residues in enzymes. the locations of the active sites were predicted by the neural network output and spatial clustering of the highest scoring residues. in most testing cases, the likely functional residues were identified correctly, as were a number of potentially novel functional groups. ofran and rost ( ) described a neural network to identify protein-protein interfaces from sequences. since the compositions of contacting residues of the interaction sites were believed to be unique, the features of this known interaction sites were used to train the neural network. zhou and shan ( ) trained a neural network to predict protein-protein interactions. their method combines conservation and structural properties of individual residues. fariselli et al. ( ) reported a neural network-based system using information on evolutionary conservation and surface disposition. chen and zhou ( ) also provided a neural network method to predict interface residues in a protein-protein complex. there are also neural network methods for predicting nucleic acid-binding (na-binding) sites. stawiski et al. ( ) presented an automatic neural network approach to predict na-binding proteins, specifically dna-binding proteins. this method uses an ensemble of features extracted from characterization of the structural and sequence properties of large, positively charged electrostatic patches. structural and physical properties of dna provide important constraints on the binding sites formed on the surfaces of the dna-targeting proteins. the characteristics of dna-binding sites may form the basis for predicting dna-binding sites from the structures of proteins alone. tjong and zhou ( ) used a representative set of protein-dna complexes from the pdb to analyze characteristics and to train a neural network predictor of dna-binding sites. the input to the predictor consists of psi-blast sequence profiles and solvent accessibility of each surface residue and of its closest neighboring residues. ferrer-costa et al. ( ) provided a web-based method to detect if a protein structure contains a dnabinding helix-turn-helix (dbhth) motif. the method uses a neural network with no hidden layers, i.e. a linear predictor, to classify whether a protein is dna-binding with the hth motif. the linear predictor was trained on a nonhomologous set of structures of protein chains with a dbhth motif and without the motifs. sodhi et al. ( ) used a neural network to predict metal-binding sites residues in low-resolution structural models. the method involves sequence profile information combined with approximate structural data. several neural networks were proposed to distinguish the metal sites from non-sites and then to detect these functionally important regions. in keil et al. ( ) , the patches of the molecular surface were segmented into overlapping patches. the properties of these patches were calculated based on the physical and chemical properties. a neural network strategy was then used to identify possible binding sites by classifying the surface patches as protein-protein, protein-dna, protein-ligand or nonbinding sites. kuznetsov et al. ( ) applied an svm method to predict dna-binding sites using the features including amino acid sequence, profile of evolutionary conservation of sequence positions, and low-resolution structural information. the results indicate that an svm predictor based on a properly scaled profile of evolutionary conservation in the form of a position specific scoring matrix (pssm) significantly outperforms a pssm-based neural network predictor. such results imply that the combination of the two methods may improve the accuracy. passerini et al. ( ) introduced a two-stage learning method for identifying histidines and cysteines that participate in binding of several transition metals and iron complexes. the first stage is an svm, which is trained to locally classify the binding state of single histidines and cysteines. the second stage is a neural network trained to refine local predictions. the methods use only sequence information by utilizing position-specific evolutionary profiles. statistical methods statistical learning also provides an effective way to link the features of local structures with their functional implication. liang et al. ( a) provided a supervised learning algorithm, feature, for the automatic discovery of physical and chemical descriptions of protein microenvironments. the calculated feature vectors were used to predict functional motifs based on bayesian inference. the method has also been proposed as an interactive web tool, webfeature, for identifying and visualizing functional sites . bradford et al. ( ) developed a method to predict both protein-protein binding site location and interface type (obligate or non-obligate) using a bayesian network in combination with surface patch analysis. two bayesian network structures, naive and expert, were trained to distinguish interaction surface patches. wang et al. ( b) proposed a computational method learned by the expectation maximization (em) algorithm, insite, to search for motifs whose presence in a pair of interacting proteins determined which motif pairs have high affinity that would lead to an interaction between proteins. yan et al. ( ) also provided a two-stage method consisting of an svm and a bayesian classifier for predicting the surface residues of proteins that participate in protein-protein interaction. the method exploits the fact that interface residues tend to form clusters in the primary amino acid sequence. in addition, chou and cai ( ) provided a covariant discriminant algorithm to predict active sites of enzyme molecules. the high accuracy of prediction shows the effectiveness of the method. protein-dna interactions are critical for deciphering the mechanisms of gene regulation. yan et al. ( ) presented a supervised machine learning approach for the identification of amino acid residues involved in protein-dna binding sites. a naive bayesian classifier was trained for predicting whether a given amino acid residue is a dna-binding residue based on its identity and the identities of its sequence neighbors. mclaughlin and berman ( ) developed statistical models for discerning protein structures containing the dbhth motifs. the method uses a decision tree model to identify the key structural features required for dna binding. these features include a high average solvent-accessibility of residues within the recognition helix and a conserved hydrophobic interaction between the recognition helix and the second alpha helix preceding it. the adaboost algorithm was used to search the pdb with the aim of identifying the structure containing the motifs with high probability. metal ions are crucial in facilitating the function of a protein. identifying the features of metal binding sites provides crucial knowledge of the function performance of the local structures. because the residues that coordinate a metal often undergo conformational changes upon binding, the detection of binding sites based on simple geometric criteria in proteins without bound metal is difficult. however, aspects of the physicochemical environment around a metal-binding site are often conserved, even when this structural rearrangement occurs. ebert and altman ( ) developed a bayesian classifier using known zinc-binding sites as positive training examples and nonmetal-binding regions as negative training examples. babor et al. ( ) reported an approach that identifies transition metal-binding sites in proteins by combining the decision tree and svm. in the first step, the geometric search of structural rearrangements following metal binding was taken into account by a decision tree classifier. a second classifier based on svms was then used to identify the metal-binding sites. nayal and honig ( ) proposed a comprehensive method to identify drug-binding sites in which attributes were first computed for each cavity, and these were then used to distinguish drug-binding sites by the random forest classification scheme. the cavity properties cover eight broad categories, such as cavity size, cavity shape, hydrophobicity, electrostatics, hydrogen bonding, amino acid composition, secondary structure and rigidity. an interesting method to identify function motifs is based on the graph theory and the network concept. the main methods are listed in table . one subcategory of the method represents the protein structure as a complex network. a node represents a c a of the backbone, and an edge linking two nodes represents the physical distance or the functional relationship between the nodes. greene and higman ( ) viewed protein structures as network systems. the systems are identified to exhibit small-world, single-scale and, to some degree, scale-free properties. using the network model, amitai et al. ( ) identified active site residues. the method transforms a protein structure into a residue interaction graph, where graph nodes represent amino acid residues, and links represent their interactions. the active site, ligand-binding and evolutionary conserved residues are identified typically with a high closeness value, from which the functional residues are filtered out. del sol et al. ( ) also represented a protein liu et al. ( b) structure as a small-world network and searched the topological determinants related to functionally important residues. the method investigates the performance of residues in protein families. the results indicate that enzyme active sites are located in surface clefts, and hetero-atom binding residues have deep cavities, while protein-protein interactions involve a more planar configuration. wangikar et al. ( ) reported a method for detecting recurring side-chain patterns using an unbiased and automatic graph theoretic approach. the method first lists all structural patterns as subgraphs. the patterns are compared in a pairwise manner based on content and geometry criteria. the recurring pattern is then detected using an automatic search algorithm from the all-againstall pairwise comparison proteins. similarly, huan et al. ( ) defined a labeled graph representation of a protein structure in which edges connecting pairs of residues are labeled by the euclidian distance between the c a atoms of the two residues. based on this representation, a structural motif corresponding to a labeled clique occurs frequently among the graphical representation of the protein structures. the paper further presented an efficient mining algorithm aimed at discovering structure motifs in this setting. in studies on protein structure and function, identifying calcium-binding sites in proteins is one of the first steps towards predicting and understanding the role of calcium in biological systems. calcium-binding sites are often complex and irregular, and it is difficult to predict their location in protein structures. deng et al. ( ) reported a rapid and accurate method for detecting calcium-binding sites. this algorithm uses a graph theory algorithm to identify oxygen clusters of the protein and a geometric algorithm to identify the center of these clusters. a cluster of four or more oxygen atoms has a high potential for calcium binding. a potential calcium-binding position is a clique and can be detected by a clique-detecting algorithm. the high accuracy of prediction shows that the majority of calcium-binding sites in proteins are formed by four or more oxygen atoms in a sphere center with a calcium atom. the above network methods all focus on individual proteins and represent a protein structure a complex network. the specific topology features clearly imply a particular function module (zhang and grigorov ; zhang et al. ). recently, a novel category of networkbased analysis of the protein local structures at the macro level has been proposed (liu et al. ) . the similarity of the local structures, specifically the pockets on the protein surface, is mapped to constitute a similarity network. the nodes represent the pockets, and the edges represent the certain similarity relationships among the pockets. the properties of the pocket similarity network are like other complex networks (liu et al. ) . the similar pockets are identified by the clusters and community structures, and the special features of the network are helpful in clustering the pockets into similar groups (liu et al. b) , which may imply clusters of structure motifs and correspond to special functional implications (liu et al. ) . with the network concept, the pockets can also be used to characterize and predict protein functions by annotating the topology neighbors. in this way, the accuracy of the prediction is better than that with the global structural similarity approach (liu et al. a ). prediction of functions at the cellular level most of the methods used to annotate protein functions that are listed above are based on molecular function at the biological processing level. at the cellular component and location levels, the importance of protein local structure is also critical. in fact, information on the subcellular locations of proteins is important because it can provide useful insights into protein functions as well as how and in what kind of cellular environments they interact with each other and with other molecules. such information is also fundamental and indispensable to systems biology because a knowledge of the localization of proteins within cellular compartments can facilitate our understanding of the intricate pathways that regulate biological processes at the cellular level. from this perspective, the functions of proteins at different levels are strongly inter-related to each other. at the cellular component level, local structures are still crucial in determining the roles of proteins and specific functions. many methods for predicting the subcellular location of proteins have been proposed recently because the location of such proteins in the cell can provide useful insights or clues about their functions . one of the more powerful methods applied in location prediction is based on an important descriptor of the protein sample, i.e. the pseudo-amino acid (pseaa) composition (chou ) . this descriptor can be used to represent a protein sequence with a discrete model yet without completely losing the sequence-order information. since the concept of pseaa composition was introduced, various pseaa composition approaches have been developed, all with the aim of improving the prediction quality of protein attributes (gao et al. ; zhou et al. a, b; diao et al. ; fang et al. ; li and li ) . the pseaa method has been widely used and extended. a very flexible pseaa composition generator (pseaac) was established which enables users to generate different kinds of pseaa composition. a web server called cell-ploc has recently been developed that allows users to predict the subcellular locations of proteins in various different organisms. pseaa composition and pssm have also been combined in various algorithms to improve the prediction quality for membrane protein type (i.e. memtype- l: chou and shen a) , enzyme main-functional class and sub-functional class (i.e. ezypred: shen and chou a) and protein subnuclear localization (i.e. nuc-ploc: shen and chou b) . a comprehensive review published recently provides a summary of these topics. in addition to sequence information, local structural information is useful, interesting and important in protein localization function prediction. a quality assessment of the results is necessary at all three levels of function prediction. the predicted functions of proteins can be taken as indicators of the directions to be taken by researchers when carrying out experiments to validate the functions of proteins. many of the computational methods used to annotate protein functions as well as those used to predict functionally important local structures use cross-validation methods to assess the performance of a prediction; these include the independent dataset test, subsampling test and jackknife test (chou and zhang ) . however, as elucidated by chou and shen ( ) , of these cross-validation methods, the jackknife test is considered to be the most objective and has been increasingly used by investigators to examine the accuracy of various predictors (zhou ; zhou and assa-munt ; zhou and doctor ; xiao et al. ; zhou and cai ; chen et al. ; shi et al. ) . it is important to consider the relationship among the functional terms and the semantic similarity with the aim of avoiding biases in the assessment of functional similarity (liu et al. a ). the global structure similarity-based methods provide a straightforward approach to annotate protein functions. however, since the relationships between structures and functions are so complex, local structure-based methods can be used to predict protein function directly by identifying the local structures carrying out particular functions. laskowski et al. ( ) proposed a novel method of predicting protein function using local three-dimensional templates. the authors build a template database and use four types of templates-enzyme active sites, ligandbinding residues, dna-binding residues and reverse templates-to construct the relationship between templates and functions. ferre et al. ( ) described a method for the functionrelated annotation of protein structures based on the detection of local structural similarity with a library of annotated functional sites. an automatic procedure was used to annotate the function of the local surface regions, and then a sequence-independent algorithm was developed to compare exhaustively these functional patches with a larger collection of protein surface cavities. after tuning and validating the algorithm on a dataset of well-annotated structures, the results are able to provide functional clues to proteins that do not show any significant sequence or global structural similarity with proteins in the current databases. binkowski et al. ( ) provided similar methods to annotate protein functions from the protein surface similarity. pockets are identified by castp from several proteins. these pockets are queried in the pvsoar to locate similar pockets corresponding to annotated proteins. the conservation among the pockets can be detected by the sequence identities and other similarity metrics. tseng and liang ( ) developed a bayesian markov chain monte carlo method for rate estimation of the special substitution rates of the short sequence of local structure. moreover, a method for protein function prediction is presented by surface matching using scoring matrices derived from estimated substitution rates for residues located on the binding surfaces. the method is effective in identifying functionally related proteins that have overall low sequence identity. the method provided by pazos and sternberg ( ) first identifies functional sites in proteins by bridging the local structures and functions, then the functions of a target proteins can be inferred from the similarity of the functional sites in the position-specific scoring matrices. information on the functional importance of local structure can facilitate the annotation of protein function more precisely. george et al. ( ) proposed an effective method to annotate protein function through the use of functional clues of conservation among the catalytic residues. this method improves the precision of annotation significantly. the advantages of predicting protein functions from local structures are based on the fact that such methods can be implemented without any prior homology hypothesis. the methods can be used in proteins in midnight zone without sequence similarity, and local structures often provide concrete and specific functional annotations. to compare the precision and coverage of the global structural similarity and that of local structures, liu et al. ( a) proposed a novel method to predicted protein from the pockets on the protein's local surface region. the similarity of regional local surface pockets and the global similarity of proteins are all represented by networks. the prediction is based on the network topology. a comparison of the results show that the local-structure-based prediction is better than the global-structure-based prediction (liu et al. a ). in this paper, we have reviewed protein function prediction methods at different levels, i.e. sequence, structure, interaction and integration. we have mainly focused on the importance of local structures and the method used to predict functionally important local structures. in summary, we discuss possible future directions. the interaction between proteins provides high-level information on protein function, especially in various biological processes. although there are thousands of known interactions, a tiny fraction of these are available in precise molecular details. if we are able to examine structural details, systematic representation of the interaction would accurately reflect biological reality. for example, we can predict which part of the structures is most likely to be involved in interaction with other macromolecules, proteins, dna or rna by analyzing the properties of different local patches on the protein surface. the patch analysis, which considers properties of the surface such as flatness, hydrophobicity, charge and, in particular, residue conservation, is effective in identifying protein-protein interaction surfaces and has also been shown to successfully identify dna-binding sites (aloy and russell ) . structural systems biology is a very effective approach that combines protein interactions and protein three-dimensional structures. the mechanisms of protein and protein interaction lie in the local structures between the two protein surfaces. from this perspective, structural systems biology provides us with a new direction in the fields of structural biology and systems biology. it combines the key features of the two directions to provide more insight into linking the single protein and systematic interaction between proteins. the relationships between local structures and functions are expected to play important roles in structural systems biology. the computational methods used to bridge the relationship between local structures and functions can be further improved. the community of computational biology has a strong need for comprehensive feature selection in concise and effective ways. in addition, there is still much room for improvement in terms of the accuracy of the methods used to align the features between two local structures. the validation of the functions of structural motifs should also be conducted more carefully and by more reliable biological experiments. recent advances in the field inspired by developments in sequences and structures demonstrate the great potential of such research in protein science in elucidating essential functional roles of the local structures. in our opinion, research aimed at bridging the gaps between local structures and function is still in its infant stage, and further advances in such areas will greatly enhance our ability to study the fundamental properties of proteins at a system-wide level. in other words, we expect to gain deep insight into essential mechanisms of biological systems from both structural and functional perspectives. different methods based on the local similarity, global similarity and interaction require and use different information, and they have different aspects, intentions and advantages. to our knowledge, the function annotation problem is still in its developing period and needs more comprehensive or hybrid approaches. none of the existing methods are likely to be successful in all cases to annotate a protein with its functions correctly and comprehensively. one reason for this is that protein functions not only rely on the sequence and/or folding characteristics, but also on the cell environment, the cycle of the biological processes and other chemical compounds. there are still many difficultto-decipher proteins that researchers have been unable to annotate correctly by any existing method. hence, a sensible strategy is to use different methods to incorporate data from multiple sources and to extensively utilize existing function annotations. future directions include using combinations of different methods at different levels so as to efficiently explore the overall sequences, global structures and local structures and to obtain more information on interactions between the target proteins and others in the cellular context. although computational methods generally cannot directly validate protein functions, the predefined tentative annotations provide valuable information as a basis for further efficient validation experiments. network analysis of protein structures identifies functional residues consurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information a graph-theoretic approach to the identification of threedimensional patterns of amino acid side-chains in protein structure prediction of transition metal-binding sites from apo protein structures network biology: understanding the cell's functional organization structural insights into proteinmetal ion partnerships nomenclature committee of the international union of biochemistry and molecular biology (nc-iubmb). enzyme nomenclature. recommendations . supplement : corrections and additions kernel-based machine learning protocol for predicting dna-binding proteins inferring functional relationships of proteins from local sequence and spatial surface patterns castp: computed atlas of surface topography of proteins pvsoar: detecting similar surface patterns of pocket and void surfaces of amino acid residues on proteins protein surface analysis for function annotation in high-throughput structural genomics pipeline statistical analysis and prediction of protein-protein interfaces flu virus proton channel analyzed: structures of key surface protein suggest different drug mechanisms insights into protein-protein interfaces using a bayesian network prediction method a tour of structural genomics predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces identify catalytic triads of serine hydrolases by support vector machines the gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology ligand binding: functional site location, similarity and docking predicting functionally important residues from sequence conservation analysis and prediction of functionally important sites in proteins prediction of interface residues in proteinprotein complexes by a consensus neural network method: test against nmr data prediction of linear b-cell epitopes using amino acid pair antigenicity scale revealing divergent evolution, identifying circular permutations and detecting active-sites by protein structure comparison improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design prediction of protein cellular attributes using pseudo amino acid composition (erratum: ibid structural bioinformatics and its impact to biomedical science a novel approach to predict active sites of enzyme molecules predicting protein-protein interactions from sequences in a hybridization space memtype- l: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-pssm recent progresses in protein subcellular location prediction cell-ploc: a package of web-servers for predicting subcellular localization of proteins in various organisms prediction of protein structural classes binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against sars (erratum: ibid high-throughput identification of interacting protein-protein binding sites residue centrality, functionally important residues, and active site shape: analysis of enzyme and non-enzyme families predicting calciumbinding sites in proteins-a graph theory and geometry approach practical limits of function prediction using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and lempel-ziv complexity analogue inhibitors by modifying oseltamivir based on the crystal neuraminidase structure for treating drug-resistant h n virus robust recognition of zinc binding sites in proteins protein function in the post-genomic era prediction of functionally important residues based solely on the computed energetics of protein structure predicting dna-binding proteins: approached from chou's pseudo amino acid composition and other specific sequence features prediction of protein-protein interaction sites in heterocomplexes with neural networks surface: a database of protein surface regions for functional annotation functional annotation by identification of local surface similarities: a novel tool for structural genomics hthquery: a method for detecting dna-binding proteins with a helix-turn-helix structural motif three-dimensional, sequence order-independent structural comparison of a serine protease against the crystallographic database reveals active site similarities: potential implications to evolution and to protein folding the binding interface database (bid): a compilation of amino acid hot spots in protein interfaces using pseudo amino acid composition to predict protein subcellular location: approached with lyapunov index, bessel function, and chebyshev filter effective function annotation through catalytic residue conservation comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins surprising similarities in structure comparison a method for localizing ligand binding pockets in protein structures sitebase: a database for structurebased protein-ligand binding site comparison fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships structural genomics: computational methods for structure analysis exploiting d structural templates for detection of metal-binding sites in protein structures uncovering network systems within protein structures using a neural network and spatial clustering to predict the location of active sites in enzymes distance-based identification of spatial motifs in proteins using constrained frequent subgraph mining ligsite csc : predicting ligand binding sites using the connolly surface and degree of conservation ligsite: automatic and efficient detection of potential small molecule-binding sites in proteins protein structure comparison by alignment of distance matrices mapping the protein universe an algorithm for predicting protein-protein interaction sites: abnormally exposed amino acid residues and secondary structure elements global mapping of the protein structure space and application in structure-based inference of protein function prediction of functional sites in proteins using conserved functional group analysis pdbsitescan: a program for searching for active, binding and posttranslational modification sites in the d structures of proteins pdbsite: a database of the d structure of protein functional sites a new bioinformatic approach to detect common d sites in protein structures principles of protein-protein interactions searching for functional sites in protein structures protein-rna interactions: a structural analysis using electrostatic potentials to predict dna-binding sites on dnabinding proteins quantitative assessment of relationship between sequence similarity and function similarity shape variation in protein binding pockets and their ligands kegg: kyoto encyclopedia of genes and genomes detection of pockets on protein surfaces using small and large probe spheres to find putative ligand binding sites pattern recognition strategies for molecular surfaces: iii. binding site prediction with a neural network identification of protein biochemical functions by similarity search using the molecular surface database ef-site recognition of spatial motifs in protein structures prediction of protein-protein interaction sites using support vector machines comprehensive evaluation of protein structure alignment methods: scoring by geometric measures secondary-structure matching (ssm), a new tool for fast protein structure alignment in three dimensions using evolutionary and structural information to predict dna-binding sites on dnabinding proteins prosup: a refined tool for protein structure alignment three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins surfnet: a program for visualizing molecular surfaces, cavities and intermolecular interactions protein clefts in molecular recognition and function from protein structure to biochemical function? protein function prediction using local d templates q-sitefinder: an energy-based method for the prediction of protein-ligand binding sites automatic multiple structure alignment and detection of a common substructural motif using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach protein-protein interactions: hot spots and structurally conserved residues often locate in complemented pockets that pre-organized in the unbound states: implications for docking webfeature: an interactive web tool for identifying and visualizing functional sites on macromolecular structures automatic construction of structural motifs for predicting functional sites on protein structures protein binding site prediction using an empirical scoring function evolutionary predictions of binding surfaces and interactions predicting gene ontology functions from protein's regional surface structures an approach for clustering protein pockets into similar groups analysis of protein surface patterns by pocket similarity network protein-dna interactions: amino acid conservation and the effects of mutations on binding specificity an overview of the structures of protein-dna complexes amino acidbase interactions: a three-dimensional analysis of protein-dna interactions at an atomic level protein-protein interaction: structurally conserved residues distinguish between binding sites and exposed protein surfaces a combined algorithm for genome-wide prediction of protein function statistical models for discerning protein structures containing the dna-binding helix-turn helix motif real spherical harmonic expansion coefficients as d shape descriptors for protein binding pocket and ligand comparisons scop: a structural classification of proteins database for the investigation of sequences and structures on the nature of cavities on protein surfaces: application to the identification of drug-binding sites promate: a structure based prediction program to identify the location of protein-protein binding sites predicted protein-protein interaction sites from local sequence information cath-a hierarchic classification of protein domain structures ssap: sequential structure alignment program for protein structure comparison from protein structure to function inference of protein function from protein structure prediction of functional sites by analysis of sequence and structure conservation identifying cysteines and histidines in transition-metal-binding sites using support vector machines and neural networks automatic prediction of protein function and detection of functional sites from structure the catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data molecular shape comparisons in searches for active sites and functional similarity twilight zone of protein sequence alignments the funcat, a functional annotation scheme for systematic classification of proteins from whole genomes detection of protein three-dimensional side-chain patterns: new examples of convergent evolution a structural perspective on protein-protein interactions computational methods of analysis of protein-protein interactions integrating structure, bioinformatics, and enzymology to discover function-bioh, a new carboxylesterase from escherichia coli a new method to detect related function among proteins independent of sequence and fold homology structure and mechanism of the m proton channel of influenza a virus predicting enzyme function from sequence: a systematic appraisal network-based prediction of protein function ezypred: a top-down approach for predicting enzyme functional classes and subclasses nuc-ploc: a new web-server for predicting protein subnuclear localization by fusing pseaa composition and psepssm pseaac: a flexible web-server for generating various kinds of protein pseudo amino acid composition using pseudo amino acid composition to predict protein subcellular location: approached with amino acid composition distribution protein structure alignment by incremental combinatorial extension (ce) of the optimal path siteengines: recognition and comparison of binding sites and protein-protein interfaces structural alignment of protein-dna interfaces: insights into the determinants of binding specificity hierarchical protein structure alignment using both secondary structure and atomic representations identifying structural motifs in proteins predicting metal-binding site residues in low-resolution structural models annotation in three dimensions. pints: patterns in non-homologous tertiary structures finding functional sites in structural genomics proteins a model for statistical significance of local similarities in structure annotating nucleic acid-binding function based on protein structure analysis and prediction of carbohydrate binding sites gene ontology: tool for the unification of biology displar: an accurate method for predicting dna-binding sites on protein surfaces using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families estimation of amino acid residue substitution rates at local spatial regions and application in protein function inference: a bayesian monte carlo approach structure-based prediction of dna-binding sites on proteins using the empirical preference of electrostatic potential and the shape of molecular surfaces global protein function prediction from protein-protein interaction networks tess: a geometric hashing algorithm for deriving d coordinate templates for searching structural database. application to enzyme active sites virtual screening for finding natural inhibitor against cathepsin-l for sars therapy insite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale functional sites in protein families uncovered via an objective and automatic graph theoretic approach predicting protein function from sequence and structural data prediction of protein function from protein sequence and structure assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function throng traditional and probabilistic scores prediction of protein-protein interactions: the capri experiment, its evaluation and implications using complexity measure factor to predict protein subcellular location a two-stage classifier for identification of protein-protein interface residues predicting dna-binding sites of proteins from amino acid sequence an accurate, sensitive, and scalable method to identify functional sites in protein structures fatcat: a web server for flexible structure comparison and structure similarity searching lga-a method for finding d similarities in protein structures neural networks in optimization similarity networks of protein binding sites tm-align: a protein structure alignment algorithm based on tm-score discovering functions and revealing mechanisms at molecular level from biological networks prediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive bayes feature fusion gene function prediction using labeled and unlabeled data protein domain annotation with integration of heterogeneous information sources an intriguing controversy over protein structural class prediction some insights into protein structural class prediction predicting protease types by hybridizing gene ontology and pseudo amino acid composition subcellular location prediction of apoptosis proteins interaction-site prediction for protein complexes: a critical assessment prediction of protein interaction sites from sequence profile and residue neighbor list improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine using chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes fast: a novel protein structure alignment algorithm acknowledgments this work was supported by the national natural science foundation of china (nsfc) under grant no. and no. . lyw and xsz are also supported by the grant no. cb from the ministry of science and technology, china. the research was also supported by nsfc-jsps collaborative project no. . the authors are grateful to the anonymous referees as well as editors for comments and for helping to improve the earlier version. we recognize that this review is far from comprehensive, and we apologize for any papers related to the subject that were not mentioned. key: cord- -yf fvx k authors: bekaert, michaël; rousset, jean-pierre title: an extended signal involved in eukaryotic − frameshifting operates through modification of the e site trna date: - - journal: mol cell doi: . /j.molcel. . . sha: doc_id: cord_uid: yf fvx k by using a sensitive search program based on hidden markov models (hmm), we identified viruses carrying frameshift sites among fully sequenced virus genomes. these viruses are clustered in specific families or genera. sequence analysis of the frameshift sites identified here, along with previously characterized sites, identified a strong bias toward the two nucleotides ′ of the shifty heptamer signal. functional analysis in the yeast saccharomyces cerevisiae demonstrated that high frameshifting efficiency is correlated with the presence of a Ψ modification in the trna present in the e site of the ribosome at the time of frameshifting. these results demonstrate that an extended signal is involved in eukaryotic frameshifting and suggest additional interactions between trnas and the ribosome during decoding. lated by the modification status of the e site trna. over-gions from the viruses that we had functionally characterized (see experimental procedures) and used it to all, our results propose an extended model for Ϫ frameshift sites. search the genbank viral genome database (release / / ). motifs were identified and subsequently manually inspected to eliminate false positives. we results and discussion checked ( ) that the first nucleotide of the heptamer is in frame with the aug of the upstream coding region, characterization of viral Ϫ frameshift sites ( ) for a protein motif associated to the upstream and the recode database resource (http://recode.genetics. downstream coding regions, and ( ) for the presence utah.edu, baranov et al., ) describes viruses susof a potential secondary structure downstream of the pected or demonstrated to carry a frameshift site. few heptamer. by this procedure, we identified frameshift Ϫ frameshift signals are fully documented. the gesites in viral genomes. most false positives exhibited no nomes of these viruses are entirely sequenced, and secondary structure after the shifty site and were found only five sites are precisely characterized, including the in the large and highly complex herpesvirus, papillostructure of the stimulatory pseudoknot (bwyv, hiv- , mavirus, and nucleopolyhedrovirus genomes. we con-mmtv, pemv- , and srv- ). for the remaining sites, sider this assessment to be accurate, because it dehave been analyzed by extensive directed mutagenesis pends not only on in silico methods but also on the coupled with quantification of frameshifting efficiency, biological assay of the hmm learning set. it is noteworbut the others are only partially characterized. most of thy that this method is very efficient, even though we the sites are therefore putative; i.e., they carry the typical did not take into account the stimulatory secondary heptamer and secondary structure but have never been structures when we defined the profile. rna folding alproven to be functional. gorithms are time consuming and cannot be restrained initially, we functionally characterized a larger number to a defined window in the vicinity of the heptamer. of viruses containing a putative frameshifting site. to moreover, the theoretical evaluation of thermodynamic explore the widest viral diversity possible (order, family, stability of secondary structures is not accurate for and genus), we deduced a neighbor-joining tree from pseudoknots (walter et al., ) . the viruses, based on the multiple alignment of the with the hmm profile based on only sites, we were polymerase protein sequence (figure ). from this tree, able to find all known frameshifting viruses and that we selected a subset of viruses representative of the are new or uncharacterized ( table ). the list of the global viral diversity. to assay the frameshift compeviruses with the position of their frameshift signals is in tence of each putative site, we cloned the entire viral supplemental table s available online at http://www. Ϫ frameshift region of the representative viruses in a molecule.org/cgi/content/full/ / / /dc /. ten putadual-reporter vector and estimated in vivo frameshifting tive frameshift sites were never previously annotated efficiency in yeast (see experimental procedures). and are associated to an upstream and a downstream frameshift sites from different eukaryotic species have been shown to function in yeast (bekaert et al., ; coding region. frameshifting structures were already stahl et al., ) . the existence of a functional annotated as such in the recode database, and eight frameshift signal was demonstrated for all candidates were only annotated in the sequence field of genbank (table ) or in relation to a publication that did not mention any it is unlikely that the low level of expression of scylv evidence of frameshift. for the remaining sequences, is due to the use of a heterologous host cell, because ten a site was suspected, but it was not precisely localized frameshift sites from other plant viruses are functional in between two coding regions. for those, we were able our assay. this site might be nonfunctional or carry to propose a precise position for the frameshift event polymorphic variations. the Ϫ frameshifting frequenand in some cases, a more accurate annotation. for cies varied between % and %, compatible with those example, for the ovine astrovirus (ssrna ϩ , astroviridae previously obtained with in vitro or in vivo assays (e.g., family), a putative Ϫ luteoviridae, retroviridae, tombusviridae, and totiviri-we then aligned the newly characterized sites with dae. within each family, only a few subfamilies/genera sites already identified. strikingly, we observed an imwere capable of Ϫ frameshifting (see supplemental portant bias not only at the slippery heptamer but also table s for details). however, in this latter case, all in the spacer region and just upstream of the heptamer. members of the genus submitted to hmm analyses ap-the upstream bias was never before observed, and its pear capable of Ϫ frameshifting: they carry not only detailed analysis is presented below. the hmm profile but also secondary structures as a canonical frameshift signal (supplemental table s ). for example, manual checking of the poleroviruses found by sensitive search of viral frameshift sites we established a hmm profile of efficient viral Ϫ using hmm successfully identifies a pseudoknot three to nine nucleotides downstream from the heptamer site. frameshift signals with the alignment of the slippery re- an order hmm search where the probability of a given nucleotide is dependent on the identity of the previous pression concluded that translation of the second orf is initiated on its own internal aug codon (huang and nucleotide. accordingly, figure shows the bias of dinucleotide distribution. the score for the last dinucleo-ghabrial, ). however, this does not exclude the possibility that both mechanisms are at play to express tide position before the heptamer is with degrees (table ): all con- possible sequences within the context of the wildtype (wt) frameshift signal of the avian infectious bron-structs that exhibited a high-frameshifting level use a cognate (or near-cognate) trna carrying the ⌿ modi-chitis virus (ibv), because it has been extensively used as a model virus for Ϫ frameshifting studies (brierley fication. conversely, the sequences that do not involve a codon decoded by a trna with the ⌿ modification et al., , ). table shows that a . -fold variation was found between the frameshifting efficiencies di-direct low-frameshifting efficiency. this observation prompted us to investigate the effect of the mutation of rected by these ibv variant sites. compared to the wt sequence, the frameshifting level is significantly reduced (p value Ͻ Ϫ ) in ten of the mutants. the dinucleotide situated Ј of the heptamer corresponds to the first two nucleotides of the preceding codon; its impact can thus be interpreted as an effect either of the amino acid, the codon, or the decoding trna. because it was previously shown that trna modi- two low-and two high-frameshift rate constructs which the uuuaaac heptamer was mutated to uuu auac. in this case, tandem slippage should be ineffi-were tested in modification mutants (table ). with the low-frameshifting rate subset (frameshift efficiency cient due to the presence of two mismatches after repairing of the a site trna in the Ϫ frame, but single lower than %), pus ⌬ mutants show no significant effect (table ). in contrast, with the high-frameshifting slippage would not be affected. the frameshifting efficiency obtained with this construct was Ͻ . %, similar rate subset (frameshift efficiency higher than %), which involves decoding by a ⌿ modified trna, a to the background level. this result demonstrates that in these experiments, frameshifting actually occurred reduced frameshifting frequency was observed in pus ⌬ mutants. this frequency was similar to that directed by through a tandem trna slippage mechanism. this implies that the three sites are involved in ribosomal frame-the low-frameshifting rate subset, indicating that most of the effect was reversed in the mutant. we verified shifting (see below). the ⌿ modification is conserved over the tree of that the effect is actually due to the modifying activity of pus p and not to a possible chaperone-like activity life; its role on Ϫ frameshifting could thus be similar in a broad spectrum of organisms. this is consistent with by using the pus [d a] mutant, which harbors a mutation in the active site of the pus protein. in this mutant, the fact that the bias at the two positions upstream of the heptamer was deduced from a wide variety of viruses the high-frameshifting constructs yield lower frameshifting efficiency, as in a pus ⌬ mutant context (table ) . of different origins. however, each host cell, like the yeast strains used here, carries a specific trna pool the effect of the dinucleotide upstream of the heptamer suggests that the three ribosomal site trnas are that differs from one organism to another. this could explain the different dinucleotide usage observed be-involved in the mechanism of Ϫ frameshifting. however, although the mechanism of frameshifting in eu-tween viruses; however, not enough sequence data are available to assess this point. in any case, the existence karyotes is thought to involve mostly tandem slippage of the trnas occupying the a and p sites, single slippage of a bias indicates an important role of trna modification on Ϫ frameshifting in eukaryotes. a role of trna at the p site has been reported to occur (jacks et al., brierley et al., ; licznar et al., ) , during the accommodation step of the a-trna and not during the preceding decoding reaction (nierhaus, ; but not in eukaryotes. in these cases, the trnas involved were acting at the a or p site. noller et al., ) . however, in the case of a Ϫ frameshift event, e-trna release at the decoding step overall, these results demonstrate that the effect of the upstream context of the heptamer is directed by the would facilitate the slippery event of a and p site trnas, and this precisely might be the effect of ⌿ . biochemi-modification status of the trna decoding the Ϫ codon. cal experiments will be required to clarify this point. with several partners. the closest distance between the sequences give rise to patterns inconsistent with acanticodon stem backbones of the p-and e-trnas is cepted trees (data not shown). inconsistency of frameabout Å , which is closer than the distance separating shifting patterns with accepted phylogenetic trees is not the a-and p-trnas. the two trnas are not in direct surprising taking into account the recombinant nature of contact but are linked by the s rrna helices h , many viruses; functional requirements probably account h , and h , and loops and , both of which for both this complete conservation and the variability they directly interact with through their anticodon loops of the frameshifting site sequences. indeed, in the retro- (yusupov et al., ) . another link between e and p viridae family, the alpharetrovirus genus is exceptional sites is through the mrna. a single possible contact because some members exhibit frameshift signals but was noted between the mrna and e-trna in the crystal others do not. in fact, this genus is subdivided in two structure, but the latter was noncognate. even this noncategories: replication-competent viruses, which poscognate e site anticodon was close enough to the cosess the pol gene, and defective viruses, which do not. don, such that cognate interaction would be structurally logically, frameshift signals are found only in the latter plausible; moreover, there is biochemical evidence for category. it is even more interesting that despite their codon-anticodon specificity in the e site (lill and winposition among the totiviridae, the leishmaniavirus getermeyer, ; rheinberger et al., ). e site trna nus members do not carry Ϫ frameshift sites but, is thus sufficiently connected to the p site to suggest rather, use ϩ frameshifting to express their polymerase that it very likely plays a role in promoting the stability domain. this suggests that strong biological constraints of p site codon-anticodon pairing. ⌿ modification can are at play in the selection of a recoding event in the be expected to improperly fill the e site during the sliplife cycle of these viruses, possibly related to the incorpage-prone state, probably resulting in an unstable p poration of the polymerase as a fusion protein in the site codon-anticon interaction and enhanced Ϫ frameviral particle. shifting. this is reminiscent of the role played by a partic- ular context of a bacterial tmrna resume codon. in this an interesting feature of the results presented here is case, an unusual e site conformation destabilizes the p the involvement of an extended nonanucleotide signal site codon-anticodon interaction and induces framein ribosomal frameshifting. as demonstrated above, no shifting (trimble et al., ) . single slippage is observed in the experimental system the results presented here demonstrate that the slipused here; this nonanucleotide-directed frameshifting pery component of Ϫ frameshift signals, at least in thus involves classical tandem slippage where both a yeast, is more complex than previously anticipated. and p site trnas slip by one nucleotide upstream. this compared to the initial model of jacks et al. ( ) , implies that the three ribosomal sites are involved in sequence elements of both the Ј and Ј heptamer ele-Ϫ frameshifting. two hypotheses can be proposed to ments are now shown to participate in frameshiting effiaccount for the role of the e site trna in frameshifting. ciency through interactions between trna, mrna, and firstly, frameshifting might be enhanced by the absence the ribosome. similarly, downstream secondary strucof a trna in the e site. in this case, the ⌿ modification tures can directly or indirectly influence frameshifting. would destabilize the trna:e site interaction. secondly, a combinatorial use of upstream codons, heptamer se-⌿ might interfere directly or indirectly with the interacquences, downstream codons, and stimulatory secondtion of the p site trna with the mrna, decreasing pairary structures permit a given frameshifting efficiency for ing stability. a given virus in a given host. whether or not these differ-the first hypothesis is supported by recent results in ent sequence elements act independently remains to which premature release of the e site trna from the be established. ribosome has been shown to be coupled with highlevel ϩ frameshifting at the prfb gene, encoding the . (thompson et al., ) alignment of viral polymerase amino acid sequences retrieved from genbank was used. it was predict that the ⌿ modification induces a higher fre-employed to deduce a neighbor-joining tree with bootstrap ratory and the "frameshift team" for numerous stimulating discussions. we are especially grateful to anne-lise haenni for critically replications (saitou and nei, ) his ⌬ , leu - , , and ura - ) , and its derivative pus ⌬ (mat a, of the "slippery-sequence" component of a coronavirus ribosomal ade - , trp - , his ⌬ , leu - , , ura - , and pus ⌬::kan) . virus type rna dimerization and viral infectivity identification and characterisation of signals for ribosomal frameshifting in the rous sarcoma virus gagpol region structure of the s ribosome from - . saccharomyces cerevisiae-trna-ribosome and subunit-subunit in-kim viral rna pseudoknot drastically change ribosomal frameshifting efficiency. proc. natl. acad. sci. usa at a frameshifter rna pseudoknot is sensitive to reading phase but shows little correlation with frameshift efficiency clustal w: improving the sensitivity of progressive multiple sequence align- , - . ment through sequence weighting etude d'enzymes de modification de nuclé otides and weight matrix choice des arnt et leurs fonctions dans le mé tabolisme cellulaire chez saccharomyces cerevisiae trna slippage at the tmrna resume codon translational frameshifting generates the gamma subunit of dna polymerase iii holoenzyme. dine synthase (pus ) catalyzing the formation of psi and psi in trna anticodon loop improvement of reading frame maintenance is a common function for several trna modifications coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of rna tory signal of the human immunodeficiency virus type programmed translational Ϫ nest frameshifting on hexanucleotide motifs and the wobble properties of trnas destabilization of codon-anticodon interaction in the ribosomal exit site secondary structure and mutational analysis of the ribosomal frameshift signal of rous sarcoma virus maintaining the ribosomal reading frame: the influence of the e site during translational regulation of release factor ribosomal Ϫ frameshifting during decoding of bacillus subtilis cdd occurs at the sequence cga aag reprogrammed genetic decoding in cellular gene expression the allosteric three-site model for the ribosomal elongation cycle: features and future translocation of trna during protein synthesis atomic structures at last: the ribosome in codonanticodon interaction at the ribosomal e site the neighbor-joining method: a new method for reconstructing phylogenetic trees maintenance of the gag/gag-pol ratio is important for human immunodeficiency key: cord- - dhc authors: laptev, i. g.; golovina, a. ya.; sergiev, p. v.; dontsova, o. a. title: posttranscriptional modification of messenger rnas in eukaryotes date: - - journal: mol biol doi: . /s x sha: doc_id: cord_uid: dhc transcriptome-wide mapping of posttranscriptional modifications in eukaryotic rna revealed tens of thousands of modification sites. modified nucleotides include -methyladenosine, -methylcytidine, pseudouridine, inosine, etc. many modification sites are conserved, and many are regulated. the function is known for a minor subset of modified nucleotides, while the role of their majority is still obscure. in view of the global character of mrna modification, rna epigenetics arose as a new field of molecular biology. the review considers posttranscriptional modification of eukaryotic mrna, focusing on the major modified nucleotides, the role they play in the cell, the methods to detect them, and the enzymes responsible for modification. posttranscriptional modifications of rna were found rather long ago, but their biological functions remain obscure apart from few exceptions. cap modi fications are the best understood in eukaryotic mrna. regions distant from the mrna ends may contain n methyladenosine (m a), methylcytidine (m c), pseudouridine (Ψ), and inosine (i), which were believed to play only a minor role because their pro portion in cell rna is extremely low as compared with the standard nucleotides. as analytical methods devel oped, modified nucleotides proved to occur in certain positions, rather than being spread at random, and their occurrence in such positions sometimes reaches % [ ] [ ] [ ] [ ] [ ] [ ] . the review focuses on these four modifi cations, their occurrence, detection methods, and, when possible, the roles they play in the cell. higher eukaryotic mrnas and many virus rnas undergo ' o methylation at one or two first nucle otides (methylated nucleotides are collectively desig nated n m ) [ , ] . the most common are m a m and other a m , accounting for approximately % of all nucleotides methylated at the ribose moiety. the g m proportion is %, and c m and u m together account for % [ ] . specific mrna ' o methyltransferases were isolated from a hela cell extract and character ized in [ ] , but it was not until that their genes were identified and cloned [ ] . n m are involved in self versus nonself rna recognition [ ] . human and mouse coronaviruses mutated to lack ' o meth yltransferases induce high level production of type i interferon via the mda cytoplasmic protein, which is sensitive to dsrna [ ] . n methyladenosine (m a) is the most common mrna modification. the first m a detection in mrna dates back to the s [ ] . because mrna accounts for only a minor proportion of total cell rna, m a detection in mrna is rather problematic. by , specific adenosine methylation sites were found only in two rnas, the bovine prolactin (bprl) mrna [ , ] and rous sarcoma virus (rsv) rna [ ] . in the bprl mrna, methylation sites cluster in the ' untranslated region ( ' utr) and in the vicin ity of the polyadenylation site. the m a occurrence in the methylation sites is only ~ %. more than ten methylation sites were identified in the rsv rna. like with the bprl mrna, methylation of the rsv rna is incomplete, varying from to %. a muta tion analysis of the adenosine methylation sites in vitro and in vivo established the specific site sequence, rrm ach (where r is adenosine or guanosine and h is adenosine, uridine, or cytidine) [ ] . statistically, the sequence can be found in every nt, so that approximately adenosine methylation sites may occur in mrna on average. because m a was not detected in all of the potential sites in the bprl mrna and rsv rna, their adenosine methylation was not assumed to proceed quantitatively. it is rather difficult to detect m a in rna because methylation does not affect the watson-crick base pairing (fig. a) , and reverse transcriptase does not distinguish between modified and unmodified nucle otides. in , m a seq [ ] and merip seq [ ] were developed to detect m a in rna with a high sensitiv ity. the gist of the method is as follows (fig. b) . poly adenylated rnas are chemically cleaved into frag ments of approximately - nt. fragmented rna is divided into two portions. one is used for immuno precipitation with anti m a antibodies to obtain a modification enriched fraction, and the other serves as a reference. both of the portions are subject to deep sequencing. the m a positions are identified by com paring the sequencing results for the two portions. many rnas proved to contain m a in the well known context [ ] . generally, m a is found mostly around the stop codon of the mrna coding region. the sig nificance of this arrangement remains unclear. the m a distribution differs among different tissues and in tissues exposed to stress. a virtually unique adenosine methylation profile is therefore characteristic of each cell type. the method reports the methylated adenosine positions with a resolution of approximately nt [ , ] . sequencing with greater overlaps and a more stringent bioinformatics data processing were used to identify the adenosine methylation sites in the total yeast transcriptome at a better resolution [ ] . rna was fragmented into shorter segments, and a more rig orous approach was used to eliminate false positive results. rna from cells with the inactivated gene for methyltransferase responsible for adenosine modifica tion (which is impossible with mammalian cells because a mettl knockdown leads to apoptosis) and several rnas synthesized in vitro were used as negative controls. about half of the identified m a sites were considered to be false positives with these negative controls, indicating that the results reported in [ , ] most likely need verification. recent single molecule real time sequencing showed that hiv reverse transcriptase is m a sensitive [ ] . thermus thermophilus dna polymerase i was found to act as reverse transcriptase in the presence of mn + , being sensitive to the m a presence in these conditions [ ] . the enzymes might be suitable for sequencing the transcriptome with the identification of m a sites. a method to verify the m a presence in a particular rna site was developed in our lab [ ] . the method is based on analyzing the melting curves for dna-rna duplexes. to detect the modification in a given rna site, two primers with end to end annealing are selected (fig. c) . one primer contains a fluorescein (fam) at the ' end, and the other contains the black hole quencher bhq at the ' end. the bhq con taining primer is designed to hybridize with a m a con [ , ] . (c) primer annealing to check for the m a presence [ ] . no. taining rna region. a comparison of the differential melting curves for the control modification free duplex and a test sample reports whether m a occurs in the given rna site. the method can be used, for example, to identify the genes for methyltransferase that modifies a certain nucleotide. in the case of eukaryotic mrnas, the method is suitable for probing the adenosine methylation status in a particular site of a particular rna in various cell growth conditions. a method known as site specific cleavage and radioactive labeling followed by ligation assisted extrac tion and thin layer chromatography (scarlet) [ ] makes it possible to establish whether adenosine is methylated in a given position of a given molecule and to estimate the proportion of modified and unmodi fied nucleotides (fig. ) . in this method, a specific chimeric ' o me/ ' h oligonucleotide is con structed and hybridized to the polyadenylated rna fraction. the target rna site is cleaved with rnase h to produce two rna fragments so that the target ade nosine is at the ' end of one of the fragments. the remaining rna is phosphorylated to add [ p] to the ' ends of rna fragments, and the fragment of interest is ligated to a long single stranded oligodeox yribonucleotide. the mixture is digested with rnases a and t , which together cleave ssrna after c, u, and g. the oligodeoxyribonucleotide with p a/m a, which remains intact, is purified by denaturing electrophore sis and digested with nuclease p , which cleaves ssdna and rna to ' monophosphates. the result ing sample is assayed for [ p]a and [ p] m a by thin layer chromatography. the method is rather laborious and requires a radioactive label to be used. several methods were developed to detect m a in rna at various resolutions. each method has its drawbacks and advantages. high throughput methods the protein product of mettl was the first to be identified as eukaryotic (adenine n ) methyltrans ferase [ ] and is conserved among many organisms from yeast (ime ) to mammals. a knockdown or deletion of its gene exert various phenotypic effects, causing apoptosis in human cell lines, a lower survival in plants and drosophila melanogaster, and sporulation defects in yeasts [ ] . recent studies identified mettl as another protein that catalyzes adenosine methylation in rna and forms a heterodimeric complex with mettl [ ] . the two proteins belong to one methyltransferase superfamily and have % amino acid sequence simi larity. both of them are catalytically active and methy late oligonucleotides in vitro. the findings indicate that the two proteins act as catalytic subunits of the complex. the mettl -mettl complex was shown to interact with wtap (wilms' tumor associated pro tein) [ , ] , which is involved in splicing [ ] . a wtap gene knockdown considerably reduces the m a content, although wtap does not display methyl transferase activity in vitro [ ] . it seems that wtap acts as a regulatory protein to facilitate methyltrans ferase activity or nuclear localization of methyltrans ferases [ , ] . in addition to rna methylation, demethylation occurs as an opposite process. it is clear that the role the process plays in the cell is probably no less important. fto (fat mass and obesity associated protein) belongs to a family of proteins homologous to fe(ii)/α ketogl utarage dependent dioxygenase alkb. fto was shown to catalyze oxidative demethylation of m t and m u in ssdna and ssrna [ , ] , although its activity is lower than in alkb family proteins [ ] . its arg is essential for α ketoglutarate binding, and the r q substitution abolishes fto catalytic activ ity in vitro [ ] . severe growth retardation is observed in organisms carrying this mutation. the fto prefer ence for ssdna and ssrna is possibly explained by the presence of an additional loop, which covers a side of a conserved jelly roll motif and competes with the complementary stand of the duplex [ ] . surprisingly, the capability of m a demethylation was observed for fto [ ] . physiological substrates of alkb family proteins are not limited to n or n modified purines and pyrimidines, attention was consequently attracted to m a as the most common mrna modification [ ] . fto proved to convert m a to adenosine in synthetic ssdna and ssrna with an efficiency comparable with that of m u demethylation [ ] . the m a modi fication was not detected in higher eukaryotic dna [ ] , indicating that rna acts as a physiological sub strate of fto. to check this assumption, fto was silenced via rna interference or overexpressed in hela and hek ft cells. the results confirmed that m a is a physiological substrate of fto in these cell lines. fto was observed to occur exclusively in the nucleoplasm together with the splicing factors sart (u /u .u tri snrnp associated protein) and sc (serine/arginine rich splicing factor ), implicating fto in mrna maturation [ ] . fto demethylates m a via the formation of two intermediates, n hydroxymethyladenosine (hm a) and formyladenosine (f a) (fig. ) [ ] . the two intermediates are detectable in vitro, indicating that their dissociation and rebinding are involved in deme thylation. fto binds m a and hm a with comparable affin ity, and the generation rate of hm a is higher than that of f a. the intermediates (hm a and f a) are found in rna in vivo. their content was estimated at . - % of the total m a content, but the intermediates might degrade during rna isolation and fragmentation. in , alkbh (alkylation repair homolog ) was identified as another m a rna demethylase of the alkb family [ ] . like fto, alkbh is localized in the nucleus together with the splicing factors cs , sm, and asf/sf (alternative splicing factor/splicing factor ). it is most likely that alkbh directly inter acts with rna because a granular pattern of alkbh distribution in the nucleus is almost completely elimi nated by treating preparations with rnase a. an alkbh knockdown increases the mrna content in the cytoplasm, implicating the demethylase in mrna export [ ] . several proteins were found to selectively bind with synthetic oligoribonucleotides that mimic the adenos ine methylation site of the rsv rna and contain m as bait [ ] , elavl , ythdf , and ythdf being the best binders. elavl , which is also known as hur (human antigen r), belongs to the elval family of rna binding proteins and selectively binds au rich regions in the ' utr of mrna [ ] . elavl stabilizes mrnas that contain au rich elements [ , ] . pub lished data on the functional relationship of elavl with mrna methylation are discrepant. after elavl was initially reported to selectively bind m a containing rna [ ] , a more efficient binding was demonstrated for total rna from cells lacking m a in mrna [ ] . the two other proteins, ythdf and ythdf , belong to the yth domain superfamily of rna bind ing proteins [ ] . the yth domain is conserved among eukaryotes and is widespread in plants. ythdf recognizes m a both in vitro and in vivo [ ] . ythdf binding with m a containing oligori bonucleotides is ~ times more efficient than with nonmodified oligonucleotides in vitro. ythdf and ythdf similarly bind the modified nucleotide in vitro with a to fold higher efficiency. more than ythdf targets were found in human cell lines, and the majority of the targets occur in mrna. ythdf competes with ribosomes for mrna bind ing in the cytoplasm. ythdf bound mrna is committed to degradation. thus, ythdf acts as a sorter of m a containing mrnas. when free ribo somes are available, mrna binds with them and is translated; otherwise, ythdf binds with mrna and relocates it to a degradation site. predisposition to certain disorders was associated with polymorphic variants of the genes whose prod ucts are involved in mrna modification (table) . the effect of mrna methylation and m a produc tion on stem cell proliferation and differentiation has been discussed intensely in the past years. published data are discrepant. mettl and mettl knockdowns reduce the m a content in embryonic stem cells, thus decreasing their proliferative activity and inducing a loss of pluripotency markers; i.e., the effect is promot ing stem cell differentiation [ ] . it is possible that methylation of the mrnas coding for differentiation regulators facilitates their degradation in control stem cells. in contrast, a more recent study showed that complete mettl silencing in mouse stem cells increased their self renewal potential and blocked their differentiation into cardiomyocytes and neurons [ ] . similar results were obtained when mettl was inactivated via a mettl knockdown rather than rna interference [ ] . mettl knockout stem cells were via ble and proliferated normally, but their differentiation was distorted. when differentiation was induced, the cells still expressed genes characteristic of pluripotent stem cells. the effect was associated with stabilization of the mrnas coding for pluripotency markers. thus, it is commonly accepted that methylation destabilizes mrna, but it remains unclear what mrnas are more prone to methylation and destabilization in early development. in the s, m c was found in mrnas isolated from bhk hamster cells [ ] and certain virus rnas [ , ] , but not in hela cell mrna [ ] and sv rna [ ] . fortunately, m c proved far easier to map in rna as compared with m a. bisulfite treat disorders associated with m a related genes [ ] gene function of the protein product disorder (cytosine c ) methyltransferases several rna (cytosine c ) methyltransferases were found in eukaryotic cells: the nsun protein fam ily (nsun - and nsun homologs), dnmt , and dnmt homologs. methylating activity towards mrna was observed only for nsun as yet [ , ] . the human nsun family includes nine proteins, most of which are highly conserved among mammals [ , ] and possess a methyltransferase domain. activ ity of the domain was studied only in nsun [ ] . mouse nsun is a component of chromatoid body and is necessary for testicular differentiation [ , ] and a balance between self renewal and differentiation of skin stem cells [ ] . nsun is involved in trna methylation and modifies mrna and noncoding rnas as well [ , , , ] . studies of m c were initially limited to trna and rrna. as for mrna, the modification was poorly understood until bisulfite sequencing was applied to total cell rna [ ] and m c was identified as a modifi cation common in mrnas and noncoding rnas of various, including human, cells. the method was based on conventional dna bisulfite sequencing [ , ] and revealed m c sites in mrnas and noncoding rnas with m c accounting for . % of all cytidine residues. noncoding rnas had even a higher m c proportion, . %. methylated cytosine occurred mostly in untranslated regions and in the vicinity of binding sites for argonaute family proteins. it is pos sible that m c plays a role in the mirna mediated rna degradation pathway [ ] . two new methods were recently developed to iden tify the cytidine methylation targets in rna, taking advantage of the methylation mechanism [ ] . many rna methyltransferases are known to possess two highly conserved cysteine residues, which are essential for catalysis. one forms a covalent intermediate with the target cytidine, and the other is necessary for the covalent intermediate to be resolved after methylation [ ] . the aza ip method is based on a covalent bond ing of a cytosine analog ( azacytosine) incorporated in nascent rna with m c methyltransferase (fig. a) . azacytidine is not methylated, and the methyltrans ferase cysteine residues remains linked to the hetero cycle. the crosslinking is followed by immunoprecip itation and high throughput sequencing [ ] . with this method, many trnas and noncoding rnas were identified as nsun substrates. a high frequency of the c → g transition was observed for presumably methylated c residues, helping to recognize specifi cally methylated cytidines in rna targets. it is thought that c is incorporated in place of g during replication or reverse transcription because of cycle opening [ ] (figs. b, c ). the other method is known as miclip (methyla tion iclip) and is based on the mechanism of nsun mediated methylation. cytidine methyla tion at c starts with a covalent bonding of cys of nsun and the pyrimidine ring of cytidine. after methylation, nsun cys plays a role in cleaving the enzyme-rna covalent bond. the c a muta tion of nsun stabilizes the covalent rna-protein intermediate, and a method taking advantage of this circumstance was used in place of iclip to identify the nsun targets at a single nucleotide resolution. with this method, nsun was found to methylate trnas, mrnas, and noncoding rnas [ , ] . the two methods will find application in studying m c methyltransferases. the functional role m c plays in trna and rrna was the subject of many studies. occurring in the vari able and anticodon loops of trna, m c stabilizes its spatial structure and the codon-anticodon duplex [ ] . a double knockout in dnmt and nsun totally eliminates m c from trna, thus destabilizing the trna structure and suppressing protein synthesis in mice [ ] . the m c residues found in rrna are involved in translation and trna recognition [ ] . as already mentioned, m c is one of the most common modified nucleotides in mrnas and non coding rnas, but its functions in these molecules are poorly understood. bisulfite sequencing of the hela cell transcriptome showed that m c accumulates in untranslated regions and that mrna cytidine methy lation sites occur in the vicinity of binding sites for argonaute, a major component of the mirna/risc complex [ ] . the finding implicates m c in mirna mediated rna degradation. the hypothesis is at vari ance with the fact that mrnas identified as nsun targets by miclip do not change in expression in the absence of nsun [ , ] ; i.e., an effect of m c on mrna stability is still unproven. it is of interest that noncoding vault rna (vtrna) is methylated by nsun according to miclip data. when its m c is lost, vtrna is abnormally processed to small vtrnas (svrnas), which act as mirnas to affect expression of several genes [ , ] . two long noncoding rnas, hotair and xist, were found to contain m c in the functional regions responsible for interactions with a complex of chromatin associated proteins or in the vicinity of these regions. there is evidence that m c is capable of disrupting certain protein-protein interac tions in vitro [ ] . pseudouridine pseudouridine (Ψ), which is also termed the "fifth nucleotide," was the first modified nucleotide discov ered in rna almost years ago [ ] . pseudouridine is found in many cell rnas, from trnas and rrnas to various small nuclear rnas [ ] . such a broad dis tribution in rnas indicates that pseudouridine is important for the cell function. pseudouridine is an uridine isomer ( ribosylu racil) and forms via isomerization. first, the n -c ' glycoside bond between uracil and ribose breaks. the base thus released rotates about the n -c axis and forms a new, c glycoside bond between c and c ' [ ] . as a result, Ψ is capable of forming a hydrogen bond, which, together with the c glycoside, rather than n glycoside, bond, differentiates Ψ from all other bases. isomerization is catalyzed by enzymes of two types. the substrate is recognized by yeast cbf and mammalian dkc /dyskerine with the aid of small nucleolar rnas (snornas) having a small region complementary to the target rna [ ] [ ] [ ] [ ] . pseudou ridine synthase (pus) family proteins directly recog nize target rnas [ , ] . as snornas complementary to mrnas rather than to noncoding rnas were discovered, uridine isomerization was assumed to occur in mrna as well [ , ] . a total of human genes were predicted to code for proteins similar to known pseudouridine syn thases, but their functions were not verified experi mentally [ ] . the rna bases whose modification is subject to regulation are of particular interest. isomerization of two uridines in the yeast u snrna is regulated sepa rately in a stress depenent manner, for instance, in heat shock or nutrient deficiency [ ] . isomerization of two uridines in mammalian rrna is regulated by the kinase mtor [ ] . a transcriptome wide search for pseudouridine was recently reported for yeast and human cells [ ] [ ] [ ] . isolated rna was treated with cyclohexyl ( mor pholino ethyl)carbodiimide n toluenesulfonate (cmc). the cmc-uridine bond disrupts in an alka line milieu, while modified n cmc Ψ remains intact. the resulting rnas were examined by reverse transcription (Ψ cmc is known to terminate reverse transcription [ ] ) followed by deep sequencing. the method, which is known as psi seq, pseudo seq, or Ψ seq, reported ~ - Ψ sites for mrnas of yeast cells cultured in optimal conditions and ~ - sites in human cell lines. noncoding rnas were also found to contain Ψ. the number of Ψ sites mapped in differ ent studies depended on the read depth and the criteria employed in site selection in a computer analysis. it is noteworthy that Ψ sites are regularly distributed throughout coding and noncoding regions, rather than clustering in particular regions of transcripts [ , ] . stress dependence of the mrna pseudouridyla tion level is among the most interesting findings of transcriptome wide Ψ mapping. the Ψ proportion in yeast mrna considerably increases in heat shock [ , ] . a total of new pseudouridylation sites were identified, and the majority of them proved to be mod ified in heat shock by pus pseudouridine synthase, like Ψ residues in the u snrna [ ] . the pseudou ridylation level is approximately doubled in nutrient defi ciency [ ] . the majority of nucleotides subject to regu lated isomerization are isomerized only in the presence of active pus and pus . apart from pus family pro teins, cbf is responsible for pseudouridylation of certain uridine residues, and its activity is stress inde pendent [ ] . pseudouridine synthases were found not only in yeast, but also in human cells, and transcrip tome wide mapping of pseudouridine residues in rna was carried out for human cells growing in nor mal conditions [ , ] . a total of pseudouridylation sites were detected in mrna, and the majority of them were dkc dependent. as for mrna, sites were observed in normal conditions and sites, after h serum starvation [ ] . the biological role of isomerization to pseudouri dine is unclear for the majority of uridine residues. because Ψ is capable of hydrogen bonding with a, pseudouridine containing transcripts are translated to produce a functionally active protein without changes in amino acid composition [ ] . a regulatory role might be possible for uridine to pseudouridine conversion in stop codons. it was shown that a stop codon is misread as a sense codon when artificially modified with pseudouri dine [ ] . however, only one endogenous transcript was found to undergo pseudouridylation of the stop codon [ ] ; i.e., this function is hardly a main one for Ψ. the total number of pseudouridine residues in mrna is lower than that of other modified nucleotides. however, stress dependent poseudouridylation sug gests a regulatory function for this modification. rna editing is a type of posttranscriptional modi fication and involves adenosine deamination to inosine (i). adenosine deamination is a major type of editing in the case of mammalian rna, in contrast to trypanosoma mitochondrial rna [ ] . inosine is rec ognized as g, rather than a, during splicing and trans lation, interacting mostly with cytidine to form a com plementary pair [ ] . adenosine deamination occurs only in double stranded rna regions and involves proteins of the adar (adenosine deaminase acting on rna) family [ , , ] . three adar family proteins are encoded in the human genome: two adar isoforms (adar p , or adar l, and adar p , or adar s) [ ] , adar [ ] , and adar [ ] . their structures are schematically shown in fig. . the pro teins are highly conserved among vertebrates [ ] . adar and adar occur in many tissues, while adar is found exclusively in brain tissues and is thought to be catalytically inactive [ ] . all of the adar proteins have a dsrna binding domain at the n end and a conserved catalytic domain at the c end [ ] . the proteins are catalytically active only as homodimers, as was shown both in vitro and in vivo [ ] , and are capable of specific and nonspecific edit ing of both noncoding and coding dsrnas [ ] . adar activity defects due to mutations or changes in expression are associated with various disorders, including cancer, neurology diseases, metabolic disor ders, virus infections, and autoimmune diseases [ ] . before the advent of new generation sequencing, comparing the nucleotide sequences for cdna and reference genomes was a basis of the majority of meth ods used to identify the deamination sites in rna [ ] . only several tens of rna editing sites were iden tified by this means [ ] . high throughput sequencing was first used to find the adenosine deamination sites in [ ] . several drawbacks are characteristic of the meth ods based exclusively on sequencing. inosine cannot be distinguished from g appearing in cdna as a result of a sequencing error or a single nucleotide polymor phism. to overcome this drawback, inosine chemical erasing (ice) was developed taking advantage of the fact that cyanoethylated inosine terminates reverse transcription [ ] . combined with high throughput sequencing (ice seq), the method was used to iden tify the adenosine deamination site in the human brain transcriptome [ ] . adenosine deamination in the pre mrnas of the glutamate receptor subunit b (glur b) and serotonin c receptor ( ht cr) are the best understood cases of rna editing in the coding region. two sites where a → i deamination changes the codon were found in the glur b mrna. the r/g site affects the receptor desensitization kinetics [ ] , and the q/r site reduces the ca + channel permeabil ity [ , ] . high level editing at the q/r site is of immense importance in mammals. a decrease in edit ing level is associated with malignant glioma and lat eral amyotrophic sclerosis in humans and causes death almost immediately after birth in mice [ ] [ ] [ ] [ ] [ ] . a low q/r substitution rate increases the channel per meability to ca + and zn + , thus dramatically chang ing the membrane potential and affecting the cell sig naling pathways [ ] . changes in editing efficiency at the q/r site of glur b were observed in forebrain ischemia [ ] and were presumably due to a decrease in adar expression. five adenosine residues are subject to deamination in the pre mrna for g protein coupled ht cr, and their deamination changes three amino acid resi dues. a combinatorial editing yields different iso forms [ , ] . mice that express only the unedited ini isoform of the receptor are normal, while mice that express the fully edited vgv isoform have a sub stantially reduced fat mass in spite of hyperphagia [ ] . changes in ht cr editing level are associ ated with anxiety, depression, and suicidal behavior [ ] . adenosine residues in pre mrna introns are also subject to deamination to thereby affect splicing. edit ing may generate a new ' gu splicing site and gener ate or eliminate a ' ag splicing site [ ] . as an exam ple of this editing, adar performs adenosine deam ination in its own pre mrna, leading to a frameshift. this is an example of the regulation via negative feed back [ ] . modification of internal mrna regions has been known for a long time, but its function remains unknown in the majority of cases. n methyladenos ine is one of the most abundant and best studied of all modified nucleotides. ample data are available for this modification, but the biological roles are still unclear for both modification itself and modification related proteins. the inosine function is well established in certain cases. inosine generation changes the amino acid sequence of the mrna encoded protein or regu lates splicing. less is known about the other modifica tions, and further studies are necessary to better understand their functions. considering rna modifications, we focused mostly on mrna because its modifications are the least understood and the most interesting in terms of new mechanisms that regulate eukaryotic gene expres sion. modification of eukaryotic rrna was intention ally left beyond the scope of this review as a separate problem studied in great detail. it is clear from the above that data on mrna mod ification are fragmentary and are difficult to summa rize, especially with the purpose to focus on the roles of individual modified nucleotides. studies of the rel evant processes will certainly bring many unexpected interesting discoveries. topology of the human and mouse m a rna methylomes revealed by m a seq com prehensive analysis of mrna methylation reveals enrichment in ' utrs and near stop codons widespread occurrence of methylcytosine in human coding and non coding rna pseudouridine profiling reveals regulated mrna pseudouridylation in yeast and human cells transcriptome wide mapping reveals widespread dynamic regulated pseudouridylation of ncrna and mrna tran scriptome wide mapping of pseudouridines: pseudou ridine synthases modify specific mrnas in s. cerevi siae terminal methylguanosine in eukaryotic mrna is required for translation methylated nucleotides block ' terminus of hela cell messenger rna blocked and methylated ' termi nal cap structures of rat brain messenger ribonucleic acids post transcriptional modifications of mrna. purification and character ization of cap i and cap ii rna (nucleoside ' ) methyltransferases from hela cells ' o ribose methylation of cap in human: function and evolution in a horizontally mobile family ' o methylation of the viral mrna cap evades host restriction by ifit family members ribose ' o methylation provides a molecular signature for the distinction of self and non self mrna dependent on the rna sensor mda identifi cation of methylated nucleosides in messenger rna from novikoff hepatoma cells mapping of n methyladenosine residues in bovine prolactin mrna context effects on n adenosine methylation sites in prolactin mrna precise localization of m a in rous sarcoma virus rna reveals clustering of methylation sites: implications for rna processing comparison of methylated sequences in messenger rna and het erogeneous nuclear rna from mouse l cells high resolution mapping reveals a conserved, widespread, dynamic mrna methylation program in yeast meiosis analysis of rna base modification and structural rearrange ment by single molecule real time detection of reverse transcription identification of a selective polymerase enables detection of n methyladenosine in rna method for site specific detection of m a nucleoside presence in rna based on high resolution melting (hrm) analysis probing n methyladenosine rna modification status at single nucleotide resolution in mrna and long noncoding rna purification and cdna cloning of the adomet bind ing subunit of the human mrna (n adenisine) methyltransferase a mettl -mettl complex mediates mammalian nuclear rna n adenosine methylation mamma lian wtap is a regulatory subunit of the rna n methyladenosine methyltransferase wilms' tumor associating protein regulates g /m transition through stabilization of cyclin a mrna the obesity associated fto gene encodes a oxoglut arate dependent nucleic acid demethylase oxidative demethylation of methylthymine and methylu racil in single stranded dna and rna by mouse and human fto n methyladenos ine in nuclear rna is a major substrate of the obesity associated fto loss of function mutation in the dioxygenase encoding fto gene causes severe growth retardation and multiple malformations crystal structure of the fto protein reveals basis for its substrate speci ficity undetectable levels of n methyl adenine in mouse dna: cloning and analysis of pred , a gene coding for a putative mammalian dna adenine methyltrans ferase fto mediated for mation of n hydroxymethyladenosine and n formyladenosine in mammalian rna alkbh is a mammalian rna demethylase that impacts rna metabolism and mouse fertility cloning and characterization of hur, a ubiquitously expressed elav like protein interplay between micrornas and rna binding proteins determines developmental processes hur protein attenuates mirna mediated repression by promoting mirisc dissociation from the target rna n methylade nosine modification destabilizes developmental regu lators in embryonic stem cells yth: a new domain in nuclear proteins n methylad enosine dependent regulation of messenger rna sta bility m a rna modification controls cell fate transition in mammalian embryonic stem cells m a mrna methylation facilitates resolu tion of naive pluripotency toward differentiation the methylation state of poly a containing messenger rna from cultured hamster cells methylation of sindbis virus " s" messenger rna the methylation of adenovirus specific nuclear and cytoplasmic rna methylated, blocked termini in hela cell mrna nsun mediated cytosine methylation of vault noncoding rna determines its processing into regulatory small rnas the rna methyltransferase misu (nsun ) mediates myc induced proliferation and is upregulated in tumors identification of human trna:m c methyltrans ferase catalysing intron dependent m c formation in the first position of the anticodon of the formula the mouse cytosine rna methyltransferase nsun is a component of the chromatoid body and required for testis differentiation the rna methyltransferase misu (nsun ) poises epider mal stem cells to differentiate identification of direct targets and modified bases of rna cytosine methyltransferases deple tion of saccharomyces cerevisiae trnahis guanylyl transferase thg p leads to uncharged trnahis with additional m c dna methylation profiling of human chromosomes , , and rna methyltrans ferases utilize two cysteine residues in the formation of methylcytosine mutagenicity of aza ' deoxycytidine is medi ated by the mammalian dna methyltransferase the nucleolar rna methyltransferase misu (nsun ) is required for mitotic spindle stability bringing order to translation: the con tributions of transfer rna anticodon domain modifi cations design, biological activity and nmr solution structure of a dna analogue of yeast trnaphe anticodon domain expanding the nucleotide repertoire of the ribosome with post transcriptional modifications long non coding rnas as targets for cytosine methylation ribonucleic acids from yeast which contain a fifth nucleotide rna pseudouridylation: new insights into an old modification cocrystal struc ture of a trna Ψ pseudouridine synthase: nucle otide flipping by an rna modifying enzyme small nucleolar rnas direct site specific synthesis of pseudouridine in ribosomal rna rna guided rna modification: functional organization of the archaeal h/aca rnp pseudou ridylation of yeast u snrna is catalyzed by either an rna guided or rna independent mechanism function ality and substrate specificity of human box h/aca guide rnas pseudouridine mapping in the saccharomyces cerevi siae spliceosomal u small nuclear rnas (snrnas) reveals that pseudouridine synthase pus p exhibits a dual substrate specificity for u snrna and trna pseudouridylation (Ψ) of u snrna in s.cerevisiae is catalyzed by an rna independent mechanism identification of brain specific and imprinted small nucleolar rna genes exhibiting an unusual genomic organization rnomics: identification and function of small, non messenger rnas interpro in : new developments in the family and domain prediction database u snrna is inducibly pseudouridylated at novel sites by pus p and snr rnp s rrna is inducibly pseudouridylated by the mtor pathway translational control in cho cell cultures four newly located pseudouridylate residues in escherichia coli s ribo somal rna are all at the peptidyltransferase center: analysis by the application of a new sequencing tech nique incorporation of pseudouridine into mrna yields superior nonimmunogenic vector with increased translational capacity and biological stability converting nonsense codons into sense codons by targeted pseudouridyla tion functions and regulation of rna editing by adar deaminases regu lation of alternative splicing by rna editing rna editing by adenosine deaminases that act on rna rna editing in reg ulating gene expression in the brain molecular cloning of cdna for double stranded rna adenosine deaminase, a candidate enzyme for nuclear rna edit ing editing of glutamate receptor b subunit ion channel rnas by four alternatively spliced drada double stranded rna adenosine deaminases a third member of the rna specific adenosine deaminase gene family, adar , contains both single and double stranded rna binding domains comparative analysis of the drada a to i rna editing gene from mammals, pufferfish and zebrafish require ment of dimerization for rna editing activity of ade nosine deaminases acting on rna adenosine to inosine rna editing and human disease a to i rna editing: current knowledge sources and computa tional approaches with special emphasis on non cod ing rna molecules genome wide identification of human rna editing sites by parallel dna capturing and sequencing. sci ence inosine cyanoethylation identifies a to i rna editing sites in the human transcriptome a biochem ical landscape of a to i rna editing in the human brain transcriptome control of kinetic properties of ampa receptor chan nels by nuclear rna editing structural determinants of ion flow through recombi nant glutamate receptor channels identification of a site in glutamate receptor subunits that controls calcium permeability point mutation in an ampa receptor gene rescues lethality in mice deficient in the rna editing enzyme adar deficient rna editing of glur and neuronal death in amyotropic lateral scle rosis underediting of glutamate receptor glur b mrna in malignant gliomas reduc tion of glur rna editing, a molecular change that increases calcium influx through ampa receptors, selective in the spinal ventral gray of patients with amy otrophic lateral sclerosis glutamate receptors: rna editing and death of motor neurons adar dependent rna editing of ampa receptor subunit glur determines vulnerability of neurons in fore brain ischemia regu lation of serotonin c receptor g protein coupling by rna editing rna editing induces variation in desensitization and traf ficking of hydroxytryptamine c receptor isoforms dysregulated editing of serotonin c receptor mrnas results in energy dissipation and loss of fat mass serotonin receptor c and mental disorders: genetic, expres sion, and rna editing studies this work was supported by the russian science foundation (project no. ). key: cord- -uyswj ow authors: melin, amanda d.; janiak, mareike c.; marrone, frank; arora, paramjit s.; higham, james p. title: comparative ace variation and primate covid- risk date: - - journal: commun biol doi: . /s - - -w sha: doc_id: cord_uid: uyswj ow the emergence of sars-cov- has caused over a million human deaths and massive global disruption. the viral infection may also represent a threat to our closest living relatives, nonhuman primates. the contact surface of the host cell receptor, ace , displays amino acid residues that are critical for virus recognition, and variations at these critical residues modulate infection susceptibility. infection studies have shown that some primate species develop covid- -like symptoms; however, the susceptibility of most primates is unknown. here, we show that all apes and african and asian monkeys (catarrhines), exhibit the same set of twelve key amino acid residues as human ace . monkeys in the americas, and some tarsiers, lemurs and lorisoids, differ at critical contact residues, and protein modeling predicts that these differences should greatly reduce sars-cov- binding affinity. other lemurs are predicted to be closer to catarrhines in their susceptibility. our study suggests that apes and african and asian monkeys, and some lemurs, are likely to be highly susceptible to sars-cov- . urgent actions have been undertaken to limit the exposure of great apes to humans, and similar efforts may be necessary for many other primate species. i n late a novel coronavirus, sars-cov- , emerged in china. in humans, this virus can lead to the respiratory disease covid- , which can be fatal , . since then, sars-cov- has spread around the world, causing widespread mortality, and with major impacts on societies and economies. while the virus and its resulting disease represent a major humanitarian disaster, they also represent a potentially existential risk to our closest living relatives, the nonhuman primates. transmission incidences of bacteria and viruses-including another coronavirus (h-cov-oc )-from humans to wild populations of nonhuman primates have previously been linked to outbreaks of ebola, yellow fever, and fatal respiratory diseases, leading in some cases to mass mortality [ ] [ ] [ ] [ ] [ ] [ ] [ ] . such past events raise considerable concerns among the global conservation community with respect to the impact of the current pandemic . infection studies of rhesus monkeys, long-tailed macaques, and vervets as biomedical models have made it clear that at least some nonhuman primate species are permissive to sars-cov- infection and develop symptoms in response to infection that resemble those of humans following the development of covid- , including similar age-related effects [ ] [ ] [ ] [ ] [ ] [ ] . recognizing the potential danger of covid- to nonhuman primates, the international union for the conservation of nature (iucn), together with the great apes section of the primate specialist group, released a joint statement on precautions that should be taken for researchers and caretakers when interacting with great apes . however, the risk for many primate taxa remains unknown. here we begin to assess the potential likelihood that our closest living relatives are susceptible to sars-cov- infection. while the biology underlying susceptibility to sars-cov- infection remains to be fully elucidated, the viral target is well established. the sars-cov- virus binds to the cellular receptor protein angiotensin-converting enzyme- (ace ), which is expressed on the extracellular surface of endothelial cells of diverse bodily tissues, including the lungs, kidneys, small intestine, and renal tubes . ace is a carboxypeptidase whose activities include regulation of blood pressure and inflammatory response through its role in cleaving the vasoconstrictor angiotensin ii to produce angiotensin - and triggering varied downstream responses [ ] [ ] [ ] [ ] . ace is made up of a signal sequence at the n terminus (residues - ), a transmembrane sequence at the c terminus (residues - ), and an extracellular region, which contains a zinc metallopeptidase domain (residues - ) and a collectrin homolog (residues - ) , . characterizations of the infection dynamics of sars-cov- have demonstrated that the binding affinity for the human ace receptor is high, which is a key factor in determining the susceptibility and transmission dynamics. when compared to sars-cov, which caused a serious global outbreak of the disease in - , , the binding affinity between sars-cov- and ace is estimated to be between fourfold - and -to -fold greater . recent reports describing structural characterization of ace in complex with the sars-cov- spike protein receptorbinding domain (rbd) [ ] [ ] [ ] [ ] allow identification of the key binding residues that enable the host-pathogen protein-protein recognition. following the initial binding of the virus to the ace receptor, humans experience a great deal of variation in response to infection, with some individuals experiencing relatively mild symptoms, while others experience major breathing problems and organ failures, which can lead to death. some of this response is known to be linked to variation in how the immune system responds to infection, with some individuals experiencing a hyperinflammatory 'cytokine storm', which in turn aggravates respiratory failures and increases mortality risk , . there may also be some variation among humans in initial susceptibility to infection, such that approaches examining variation in ace tissue expression and gene sequences can offer insight into variation in human susceptibility to covid- [ ] [ ] [ ] [ ] . similarly, we can use such an approach to compare sequence variation across species, and hence try to predict the likely interspecific variation in susceptibility to initial infection. previous analysis of comparative variation at these sites enabled estimates of the affinity of the ace receptor for sars-cov in nonhuman species (bats) . here, we undertake such an analysis for sars-cov- across the primate radiation. our aim is to investigate the likelihood of initial susceptibility to infection for different major radiations and species while recognizing that downstream processes such as immune responses are likely to determine the extent to which species and individuals develop symptoms and pathologies in response to infection. we compiled ace gene sequence data from primate species for which genomes are publicly available, covering primate taxonomic breadth. for comparison, we assessed species of other mammals that have been tested directly for sars-cov- susceptibility in laboratory infection studies . we also included in our analysis the amino acid sequence variation at these sites for horseshoe bats, thought to be the original vector of the virus, and pangolins, a potential intermediate host, where viral recombination may have led to the novel viral form sars-cov- . we assessed the variation at amino acid residues identified as critical for ace recognition by the sars-cov- rbd and undertook an analysis of positive selection and protein modeling to gauge the potential for adaptive differences and the likely effects of protein variation. our aim was to develop predictions about the susceptibility of our closest living relatives to sars-cov- as a resource for stakeholders, including researchers, caretakers, practitioners, conservationists, and governmental and non-governmental agencies. variation in ace sequences. the ace gene ( bp) and translated protein ( amino acids) sequences are strongly conserved across primates. the average pairwise identity across primate species is . % for the ace nucleotide sequence and . % for the protein sequence, with a pairwise similarity (blosum ≥ ) of . % (supplementary data - ). out of bp, bp ( . %) are identical, while bp ( . %) are phylogenetically-informative sites for primates, and gene trees we generated ( supplementary fig. s a , b) closely recapitulate the currently accepted phylogeny of primates ( fig. ). in particular, the twelve sites in the ace protein that are critical for binding of the sars-cov- virus are invariant across the catarrhini, which includes great apes, gibbons, and monkeys of africa and asia (fig. ) . furthermore, catarrhines do not vary at any of the sites identified by alanine scanning (supplementary table s and supplementary fig. s ). the other major radiation of monkeys, those found in the americas (platyrrhini), have ace sequences that are less similar to humans across the length of the protein ( . - . % identical to h. sapiens, supplementary data ) but conserved within their clade (average pairwise identity . %, supplementary data ). they share nine of twelve critical amino acid residues with catarrhine primates; the three sites that vary from catarrhines, h , e , and t , are conserved within the platyrrhines. strepsirrhine primates and tarsiers, were more variable in the binding sites and less similar to the human protein across the length of the sequence ( . - . % pairwise identity, supplementary data ). like platyrrhines, the tarsier (carlito syrichta), mouse lemur (microcebus murinus), and galago (otolemur garnettii) have an h residue, while the sifaka (propithecus coquereli), aye-aye (daubentonia madagascariensis), and the blue-eyed black lemur (eulemur flavifrons) have the same allele as humans and other catarrhines, y . in non-primate mammals, a higher number of amino acid substitutions are evident ( . - . % pairwise identity to h. sapiens, supplementary data ), including at critical binding sites. all species possess a different residue to primates at site . bats are exceptionally variable within the binding sites, with the genus rhinolophus alone encompassing all of the variation seen in the rest of the non-primate mammals. where primates have glutamine (q ), bats have glutamate (e ), lysine (k ), leucine (l ), or arginine (r ) (fig. ). all fasta alignments of ace gene and protein sequences are available in supplementary data - , a full-length protein alignment is also shown in supplementary fig. s , and distance matrices are provided in supplementary data - . analysis of species-specific residues on ace -rbd interactions. the ace receptors of all catarrhines have identical residues to humans at the rbd/ace binding interface across all critical sites, and are predicted to have a similar binding affinity for sars-cov- . platyrrhines diverge from catarrhines at three of the twelve critical amino acid residues. compared to catarrhine ace , the platyrrhines' ace is predicted to bind sars-cov- rbd with a roughly -fold reduced affinity (ΔΔg bind = . kcal/mol) ( table ). in particular, the change at site from y to h found in monkeys in the americas has the largest impact of any residue change examined (table ) , which alone is predicted to lead to a -fold decrease in the binding affinity to sars-cov- ( fig. ). this single mutation combined with additional substitutions, especially q e, found in platyrrhines is predicted to substantially reduce the likelihood of successful viral binding ( table ) . of the other primates modeled, two of the three strepsirrhines, and tarsiers, also have the h residue and furthermore have additional protein sequence differences leading to further decreases in predicted binding affinity. the predicted fig. ace protein sequence alignment and evolutionary relationships of study species. branch lengths represent the evolutionary distance (time, in millions of years) estimated from timetree . we outline amino acid residues at critical binding sites for the sars-cov- spike receptor-binding domain. solid outlines highlight sites predicted to have the most substantial impact on viral binding affinity. notably, protein sequences of catarrhine primates are highly conserved, including uniformity among amino acids at all binding sites. primate species that are able to be successfully infected with covid- are indicated in red. predicted susceptibility to covid- for other primates is additionally coded by terminal branch colors. we use the nomenclature cebus capucinus to be consistent with the species name used in the genome annotation but note the recent adoption of cebus imitator for this species. silhouettes are from phylopic.org and available under the public domain dedication . license, with the exception of cebus (sarah werning; creative commons attribution . unported). binding affinity of tarsier ace is the most dissimilar to humans and this primate might be the least susceptible of the species we examine. in contrast, coquerel's sifaka (propithecus coquereli), the aye-aye (daubentonia madagascariensis), and a blue-eyed black lemur (eulemur flavifrons) share the same residue as humans and other catarrhines at site and have projected affinities that are near to humans (table ). other mammals included in our study -ferrets, cats, dogs, pigs, pangolin, and two of the seven bat species (r. pusillus and r. macrotis) -show the same residue as humans (y) at site , with accompanying strong affinities for sars-cov- . the remaining five sister species of bats possess h and lower binding affinities (table ) . adaptive evolution of ace sequences. we find evidence that the selective pressures acting on ace are not equivalent across the major clades in our analysis. the codeml clade model c provided a better fit than the null model (lrt = . , p < . ; table , supplementary table s ) (table ). in catarrhines, the three positively selected sites identified by beb calculations are not near the binding sites for sars-cov- (residues , , and ; table ). our results strongly suggest that catarrhines -all apes, and all monkeys of africa and asia, are likely to be susceptible to infection by sars-cov- . there is high conservancy in the protein sequence of the target receptor, ace , including uniformity at all identified and tested major binding sites. indeed, even among the residues identified in our full list of potential binding points, catarrhines are invariant (supplementary table residues between platyrrhines and catarrhines, and two of these, h y and e q show strong evidence of being impactful changes. these amino acid changes are modeled to reduce the binding affinity between sars-cov- and ace by ca. -fold. recent clinical analysis of viral shedding, viremia, and histopathology in catarrhine (macaque) versus platyrrhine (marmoset, callithrix jacchus) responses to inoculation with sars-cov- , show much more severe presentation of disease symptoms in the former, strongly supporting our results . similar reduced susceptibility is predicted for tarsiers, and two of the five lemurs and lorisoids (strepsirrhines). what is concerning is that three of the analyzed lemurs spanning divergent lineages-the coquerel's sifaka, the aye-aye, and the blue-eyed black lemur-are more similar to catarrhines at important binding sites, including possessing the high-risk residue variant at site , and as such are also predicted to be susceptible. nonetheless, these are only predicted results based on amino acid residues and protein-protein interaction models. we urge extreme caution in using our analyses as the basis for relaxing policies regarding the protection of platyrrhines, tarsiers or any strepsirrhines. experimental assessment of synthetic protein interactions can now occur in the laboratory, e.g. , and confirmation of our model predictions should be sought before any firm conclusions are reached. emerging evidence in experimental mammalian models appears to support our results; dogs, ferrets, pigs, and cats have all shown some susceptibility to sars-cov- but have demonstrated variation in disease severity and presentation, including across studies , . substitutions at binding sites might be at least partially protective against covid- in these mammals. for example, the limited experimental evidence to date suggests that while cats -which have the same residue as humans at site -are not strongly symptomatic, they present lung lesions, while dogs-which have a substitution at this site-do not . the amino acid residue at site differs from primates in all other mammalian species examined. however, our models suggest that the variant residues may confer relatively minor reductions in binding affinity. other sources of variation may affect ace protein stability . our results are also consistent with previous reports that ace genetic diversity is greater among bats than that observed among mammals susceptible to sars-cov-type viruses. this variation has been suggested to indicate that bat species may act as a reservoir of sars-cov viruses or their progenitors . intriguingly, all but bat species we examined have the putatively protective variant, h . additionally, results of our codeml branchsite analysis support previous findings of ace in bats being under positive selection, including sites within the binding domain of sars-cov and sars-cov- , which may be evidence of hostvirus coevolution. sites showing evidence of positive selection within catarrhine ace sequences were not in or near known cov binding sites (table and fig. ). two (residues , ) fall within the cleavage site (residues - ) utilized by the sheddase adam , known to interact with ace . however, neither of the residues under selection are the amino acids targeted by adam leaving the functional significance of evolution at these sites uncertain. further clinical and laboratory study is needed to fully understand infection dynamics. there are a number of important caveats to our study. firstly, all of our predictions are based on interpretations of gene and resultant amino acid sequences, rather than based on direct assessment of individual responses to induced infection. nonetheless, the overall pattern of our results is being borne out by infection studies on a few species that are used as biomedical models. so far, all catarrhine species tested by infection studies, including rhesus macaques, long-tailed macaques, and vervet table results of codeml analyses of adaptive evolution across ace gene sequences. monkeys , , have exhibited covid- -like symptoms in response to infection, including large lung and other organ lesions and cytokine storms . in contrast, marmosets did not exhibit major symptoms in response to infection . while these results support and validate our findings based on ace sequence interpretation, the number of primate species that can and will be tested directly by infection studies will be restricted to just a handful. our study enhances this picture, by allowing inferences to be made across the primate radiation, backed up by the published infection studies on a few target model species. some of our results, such as the uniform conservation of ace binding sites among catarrhines, backed up by the demonstrated high susceptibility of humans and other catarrhines to sars-cov- , should give a good degree of confidence of high levels of risk. given the identical residues of humans to other apes and monkeys in asia and africa at the target sites, it seems unlikely that the ace receptor and the sars-cov- proteins would not readily bind. our results for other taxa are dependent on modeling, hence should be treated more cautiously. this includes all interpretations of the susceptibility of platyrrhines and strepsirrhines, where the effects of residue differences on binding affinities have been estimated based on protein-protein interaction modeling. another caveat is that we have modeled only interactions at binding sites, and not predictions based on full residue sequence variation. residues that are not in direct contact may still affect binding allosterically. other factors, including proteases necessary for viral entry, and other viral targets, may also impact disease susceptibility and responses . more generally, if adhering to the precautionary principle, then our results highlighting higher risks to some species should be taken with greater gravity than our results that predict potential lower risks to others. another limitation of our study is that we have looked at only primate species, albeit with broad taxonomic scope. analysis of additional species is important, especially among strepsirrhine species, where our coverage is relatively scant. in particular, the residue overlap at important binding sites in the sequences of coquerel's sifaka, the aye-aye, and blue-eyed black lemur with those of catarrhines suggests many lemurs may be highly vulnerable and we underscore the need to assess a wider diversity of lemur species. furthermore, we examine only one individual per species, and intraspecific variation across populations should be considered; however, studies on intraspecific ace variation with humans and vervet monkeys suggest ace variants are low in frequency [ ] [ ] [ ] . finally, it is also important to remember that our study assesses only the potential for the initial binding of the virus to the target site. downstream consequences of infection may differ drastically based on speciesspecific proteases, genomic variants, metabolism, and immune system responses , . in humans, the development of covid- can lead to a pro-inflammatory cytokine storm of hyperinflammation, which may lead to some of the more severe impacts of infection , . nonetheless, it is evident from the hundreds of thousands of deaths and global lockdown that humans are highly susceptible to sars-cov- infection, and our results suggest that all apes and monkeys in africa and asia are similarly susceptible. many endangered primate species are now only found in very small population sizes . for example, there are believed to be only around mountain gorillas left in their entire range . with such small populations, the introduction of a new highly infectious disease is of serious concern. re-opening access to habituated great ape groups for tourism purposes, which may be critical to local economies , may be fraught with issues. iucn best practices recommend that tourists stay at least meters away from great apes , but in practice, almost all tourists get far closer than this -for example, the average distance that tourists get from mountain gorillas at the bwindi impenetrable national park in uganda is just . m . a concerted effort may be required by all stakeholders to try to avoid the introduction of sars-cov- into wild primate populations . recent measures suggested by the iucn for researchers and caretakers of great ape populations include: ensuring that all individuals wear clean clothing and disinfected footwear; providing hand-washing facilities; requiring that a surgical face mask be worn by anyone coming within m of great apes; ensuring that individuals needing to cough or sneeze ideally leave the area, or at least cough/sneeze into the crux of their elbows; imposing a -day quarantine for all people arriving into great ape areas who will come into frequent close proximity with them . the iucn's 'best practice guidelines for health monitoring and disease control in great ape populations' should also be followed . our results suggest that dozens of nonhuman primate species, including all of our closest relatives, are likely to be highly susceptible to sars-cov- infection, and vulnerable to its effects. major actions may be needed to limit the exposure of many wild primate populations to humans. this is likely to require coordinated input from all stakeholders, including local communities, international and national governmental agencies, nongovernmental conservation and development organizations, and academics and researchers. while the focus of many at this time is rightly on mitigating the humanitarian devastation of covid- , we also have a duty to ensure that our closest living relatives do not suffer from devastating infections and further population declines in response to yet another human-induced catastrophe. variation in ace sequences. we compiled ace gene sequences for catarrhine primates: species from all genera of great ape (gorilla, pan, pongo), genera of gibbons (hylobates, nomascus), and species of african and asian monkeys in genera (cercocebus, chlorocebus, macaca, mandrillus, papio, rhinopithecus, piliocolobus, theropithecus); genera of platyrrhines (monkeys from the americas: alouatta, aotus, callithrix, cebus, saimiri, sapajus); species of tarsier (carlito syrichta); and genera of strepsirrhines (lemurs and lorisoids: eulemur, daubentonia, microcebus, propithecus, otolemur) (supplementary table s ). we also included four species of mammals that have been tested clinically for susceptibility to sars-cov- infection , including the domestic cat (felis catus), dog (canis lupus familiaris), pig (sus scrofa), and ferret (mustela putorius furo). finally, we included the pangolin (manis javanica) and several bat species, including horseshoe bats (rhinolophus spp., hipposideros pratti, myotis daubentonii). sequences were retrieved from ncbi, either from annotations of published genomes or from genbank entries . we manually checked annotations by performing tblastn searches of the human ace protein sequence against each genome. we identified one misannotation for exon in microcebus murinus, which we manually corrected. the ace nucleotide sequence for alouatta palliata was obtained from an unpublished draft genome, via tblastn searches using the cebus ace protein sequence as a query and default search settings. accession numbers for sequences retrieved from ncbi and genbank are provided in supplementary table s and the alouatta palliata sequence is available in supplementary data . coding sequences were translated using geneious version . . and we aligned both nucleotide and amino acid sequences with mafft . amino acids were aligned with the blosum scoring matrix, while the pam scoring matrix was used for nucleotides. a . gap open penalty and an offset value of . were used for both. we manually inspected and corrected any misalignments, and verified the absence of indels and premature stop codons. to visualize patterns of gene conservation across taxa and identify the congruence of the ace gene tree with currently accepted phylogenetic relationships among species, we reconstructed trees using both bayesian (mrbayes . . ) and maximum likelihood (raxml . . ) methods with , mcmc cycles and bootstrap replicates, respectively (code available on github ). gene trees were compared to a current species phylogeny assembled using timetree , which is also used to illustrate the evolutionary relationships between study species in fig. . phylogenetically-informative sites along the ace sequence were identified with the pis function in the r package ips v. . . , . identification of critical binding residues and species-specific ace -rbd interactions. critical ace protein contact sites for the viral spike protein receptor-binding domain (rbd) have been identified using cryo-em and x-ray crystallography structural analysis methods [ ] [ ] [ ] [ ] . the ace -rbd complex is characteristic of protein-protein interactions (ppis) that feature extended interfaces spanning a multitude of binding residues. experimental and computational analyses of ppis have shown that a handful of contact residues can dominate the binding energy landscape . alanine scanning mutagenesis provides an assessment of the contribution of each residue to complex formation [ ] [ ] [ ] . critical binding residues can be computationally identified by assessing the change in binding free energy of complex formation upon mutation of the particular residue to alanine, which is the smallest residue that may be incorporated without significantly impacting the protein backbone conformation . our computational modeling utilizes the human sars rbd/ace high-resolution structures, and we make the implicit assumption that the overall conformation of ace is conserved among different species. this assumption, which is rooted in the high sequence similarity between ace sequences, allows us to use the structure of the complex to predict the impact of mutations at the protein-protein interface. we defined critical residues as those that upon mutation to alanine decrease the binding energy by a threshold value ΔΔg bind ≥ . kcal/mol. nine of the residues identified by alanine scanning as involved in the ace -rbd complex met this criterion (supplementary table s ). there was a large congruence in the sites identified with those highlighted by other methods. each of the eight sites implicated by cryo-em , were also detected by alanine modeling; five residues were ≥ . kcal/mol threshold and were below this threshold. to be cautious, in addition to the critical ace sites we identified through alanine scanning, we also examined residue variation at the sites that fell below the ≥ . kcal/mol threshold but that were identified as important by structural analyses - for a total of critical sites. all computational alanine scanning mutagenesis analyses were performed using rosetta software . the alanine mutagenesis approach has been extensively evaluated and used to analyze ppis and design their inhibitors, including by members of the present authorship , . we utilized the ssipe program to predict how ace amino acid differences in each species would affect the relative binding energy of the ace /sars-cov- interaction. using human ace bound to the sars-cov- rbd as a benchmark (pdb m j), the program mutates selected residues and compares the binding energy to that of the original. using this algorithm, we studied interactions of all primates across the full suite of amino acid changes occurring at critical binding sites for each species. to more thoroughly assess the impact of each amino acid substitution, we also examined the predicted effect of individual amino acid changes (in isolation) on protein-binding affinity. adaptive evolution of ace sequences. we further investigated ace and how selective pressures in different clades might be shaping variation at the binding sites, using codeml clade c and branch-site models in paml . we first tested if selection acting on ace is divergent between the major clades in our sample (platyrrhine, catarrhine, and strepsirrhine primates, non-primate mammals) with the codeml clade model c, which was compared to the null model (m a_rel) with a likelihood ratio test . this test shows whether there is a divergent selection (dn/ds ratio = ω) across all clades, but not which clades are experiencing positive selection. we, therefore, followed the clade model with a series of branch-site models, which allow one clade at a time to be designated as a set of "foreground" branches and test whether this clade has experienced episodes of positive selection compared to the remaining sets of "background" branches (ω foreground > ω background ). branchsite models are compared to a null model that fixes ω at with a likelihood ratio test. in the case of the alternative model having a significantly better fit than the null model, indicating positive selection, potential sites under positive selection are identified with a bayes empirical bayes (beb) approach . we completed branch-site models for each primate clade (platyrrhine, strepsirrhine, and catarrhine), as well as bats because previous research has identified ace to be under positive selection in this clade, potentially in response to coronaviruses . we had to exclude hipposideros pratti and myotis daubentonii from paml analyses, because only a partial ace sequence was available for these two species. input files and control files for paml codeml analyses are available in the github repository . statistics and reproducibility. models in paml were compared with likelihood ratio tests and evaluated for significance with a right-tailed chi-squared distribution. as this was a comparative study of gene sequences across species, we had one representative individual for each species (n = ) and no replicates. reporting summary. further information on research design is available in the nature research life sciences reporting summary linked to this article. nucleotide and protein sequences used in this study are available from ncbi and are also available as fasta files (supplementary data and ) and alignments (supplementary data and ) in the supplemental material. accession numbers are provided in supplementary table s . all code used in this project is available via a github repository (https://github.com/ mareikejaniak/ace ). the version of the repository used for this project has been archived in zenodo (doi: . /zenodo. ) . received: august ; accepted: october ; a novel coronavirus from patients with pneumonia in china emergence of a novel human coronavirus threatening human health impact of yellow fever outbreaks on two howler monkey species (alouatta guariba clamitans and a. caraya) in misiones, argentina ebola outbreak killed gorillas pandemic human viruses cause decline of endangered great apes descriptive epidemiology of fatal respiratory outbreaks and detection of a human-related metapneumovirus in wild chimpanzees forest fragmentation as cause of bacterial transmission among nonhuman primates, humans, and livestock human metapneumovirus infection in wild mountain gorillas human coronavirus oc outbreak in wild chimpanzees, côte d´ivoire covid- : protect great apes during human pandemics comparative pathogenesis of covid- , mers, and sars in a nonhuman primate model ards and cytokine storm in sars-cov- infected caribbean vervets age-related rhesus macaque models of covid- primary exposure to sars-cov- protects against reinfection in rhesus macaques infection with novel coronavirus (sars-cov- ) causes pneumonia in rhesus macaques comparison of nonhuman primates identified the suitable model for covid- section on great apes. great apes, covid- and the sars cov- joint statement of the iucn ssc wildlife health specialist group and the primate specialist group tissue distribution of ace protein, the functional receptor for sars coronavirus. a first step in understanding sars pathogenesis hydrolysis of biological peptides by human angiotensinconverting enzyme-related carboxypeptidase heart block, ventricular tachycardia, and sudden death in ace transgenic mice with downregulated connexins the anti-inflammatory potential of ace /angiotensin-( - )/mas receptor axis: evidence from basic and clinical research the pivotal link between ace deficiency and sars-cov- infection a human homolog of angiotensin-converting enzyme. cloning and functional expression as a captopril-insensitive carboxypeptidase ace x-ray structures reveal a large hinge-bending motion important for inhibitor binding and catalysis the international response to the outbreak of sars in severe acute respiratory syndrome (sars): a review of the history, epidemiology, prevention, and concerns for the future structural basis for the recognition of sars-cov- by full-length human ace structural basis of receptor recognition by sars-cov- structure of the sars-cov- spike receptor-binding domain bound to the ace receptor structural and functional basis of sars-cov- entry by using human ace cryo-em structure of the -ncov spike in the prefusion conformation clinical and immunologic features in severe and moderate coronavirus disease the covid- cytokine storm; what we know so far sars-cov- receptor ace and tmprss are primarily expressed in bronchial transient secretory cells structural variations in human ace may influence its binding with sars-cov- spike protein ace coding variants: a potential x-linked risk factor for covid- disease ace gene variants may underlie interindividual variability and susceptibility to covid- in the italian population angiotensin-converting enzyme (ace ) proteins of different bat species confer variable susceptibility to sars-cov entry susceptibility of ferrets, cats, dogs, and other domesticated animals to sars-coronavirus evidence of recombination in coronaviruses implicating pangolin origins of ncov- identification of critical active-site residues in angiotensin-converting enzyme- (ace ) by site-directed mutagenesis a pneumonia outbreak associated with a new coronavirus of probable bat origin evidence for ace -utilizing coronaviruses (covs) related to severe acute respiratory syndrome cov in bats ace and adam interaction regulates the activity of presympathetic neurons tmprss and adam cleave ace differentially and only proteolysis by tmprss augments entry driven by the severe acute respiratory syndrome coronavirus spike protein sars-cov- infection of african green monkeys results in mild respiratory disease discernible by pet/ct imaging and shedding of infectious virus from both respiratory and gastrointestinal tracts ace and tmprss variation in savanna monkeys (chlorocebus spp.): potential risk for zoonotic/anthroponotic transmission of sars-cov- and a potential model for functional studies human ace receptor polymorphisms predict sars-cov- susceptibility comparative genetic analysis of the novel coronavirus ( -ncov/sars-cov- ) receptor ace in different populations virus-host interactome and proteomic survey reveal potential virulence factors influencing sars-cov- pathogenesis sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor sars-cov- : a storm is raging impending extinction crisis of the world's primates: why primates matter estimating abundance and growth rates in a wild mountain gorilla population putting leakage in its place: the significance of retained tourism revenue in the local context in rural uganda best practice guidelines for great ape tourism the rules and the reality of mountain gorilla gorilla beringei beringei tracking: how close do tourists get? best practice guidelines for health monitoring and disease control in great ape populations. occasional papers of the iucn species survival commission no mafft multiple sequence alignment software version : improvements in performance and usability mrbayes: bayesian inference of phylogenetic trees raxml version : a tool for phylogenetic analysis and postanalysis of large phylogenies mareikejaniak/ace : code for primate ace project timetree: a resource for timelines, timetrees, and divergence times r: a language and environment for statistical computing (r foundation for statistical computing interfaces to phylogenetic software in r a hot spot of binding energy in a hormonereceptor interface anatomy of hot spots in protein interfaces computational alanine scanning to probe protein-protein interactions: a novel approach to evaluate binding free energies a simple physical model for binding energy hot spots in protein-protein complexes computational alanine scanning of protein-protein interfaces systematic analysis of helical protein interfaces reveals targets for synthetic inhibitors plucking the high hanging fruit: a systematic approach for targeting protein-protein interactions ssipe: accurately estimating protein-protein binding affinity change upon mutations using evolutionary profiles in combination with an optimized physical energy function paml : phylogenetic analysis by maximum likelihood an improved likelihood ratio test for detecting site-specific functional divergence among clades of protein-coding genes bayes empirical bayes inference of amino acid sites under positive selection acknowledgements m.c.j. was funded by a natural sciences and engineering council of canada discovery accelerator supplement to a.d.m. and by a postdoctoral fellowship from the alberta children's hospital research institute. p.s.a. thanks the national institutes of health (r gm ) for financial support. we thank four reviewers for constructive comments, which improved the manuscript considerably. the authors declare no competing interests. supplementary information is available for this paper at https://doi.org/ . /s - - -w.correspondence and requests for materials should be addressed to a.d.m. or j.p.h.reprints and permission information is available at http://www.nature.com/reprintspublisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.open access this article is licensed under a creative commons attribution . international license, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons license, and indicate if changes were made. the images or other third party material in this article are included in the article's creative commons license, unless indicated otherwise in a credit line to the material. if material is not included in the article's creative commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. to view a copy of this license, visit http://creativecommons.org/ licenses/by/ . /. key: cord- - rkpco authors: wise, e. j. title: studies on the ephemeroptera of a northumbrian river system: i. serial distribution and relative abundance date: - - journal: freshw biol doi: . /j. - . .tb .x sha: doc_id: cord_uid: rkpco the river coquet is a clean, fast flowing, moderately calcareous river. it is young to mature in development and supports a typical torrential fauna. marked trends in successional replacement along the river course are confined to the scarce species of ephemeroptera and the absence of longitudinal zonation in the distribution of some common species is related to the topographical characteristics of the system. a distinct successional trend by one species is attributed to its intolerance to the lower temperatures at high altitudes. major discontinuities in distribution are found between the ephemeroptera of the main river and certain tributaries. the paucity of certain otherwise abundant species in one region is related to silt deposition resulting from sand and gravel excavation. studies on the distribution of the ephemeroptera have been described. (these include : harker, ; macan, ; collins, ; langford, .) few of these studies have attempted to explain, except in very general terms, the distributions they describe and many have been restricted to either small hill streams or to localized sections of large rivers. this report describes a preliminary survey of the distribution and relative abundance of ephemeroptera nymphs in the river coquet, northumberland, england. it is intended as a contribution to the existing knowledge of stream ecology and in particular the biology of the ephemeroptera. it is also intended to increase the data available on the freshwater fauna of an area in north eastern england which, with the exception of the trichoptera (philipson, (philipson, , edington, edington, , bray, ) , the amphipod and isopod crustacea (sutclifte, ) , and the ephemeroptera (burleigh, wise and gray, ) , has received little attention in the past. the river coquet rises in the cheviot hills ft ( - m) above sea level, close to the scottish border (fig. i) . it flows eastward through a varied geological succession for a distance of some miles ( km) and enters the north sea at amble on the northumberland coast. the coquet has a catchment area of km^ and is the largest of several parallel rivers which drain the region of uplift between the major systems of tyne and tweed. it is a rain-fed spate river, young to mature in development. the valley sides are steep, tree cover is sparse and land drainage is good. consequently run-off is rapid causing violent floods of short duration. the river can be divided into three main topographical regions (fig. ) . (i) the upper "^hiw region extends for miles ( - km). the valley is unfenced hill grazing land (ii) the ^middle* region extends for just over miles ( - km) and has a more level gradient (fall - m in - km or - %). the river is broad and mainly shallow following a meandering course through an alluvial flat. the substrate consists mainly of bare stones with deposits of sand and silt in places. bryophytesaresparse,j?fl«u/if«/w.ry «/ a w lam. occurs where the substrate is suitably shallow and where silting exists beds of elodea canadensis michx. are established. marginal vegetation is sparse and tree cover is restricted to a few willows {salix spp.) and alders {abuts gltitinosa (l.) . the open valley bottom provides some rich arable land. (iii) the lower 'coastal plain'' region. the river falls steadily (fall m in km or - %) for the last miles ( km) of its course to the upper limit of tidal influence at warkworth. it is broad and shallow with a fast flowing current. the substrate consists of bare stones except where artificial conditions are induced by a series of man-made weirs. ranunctdus, fontinalis and cladophora glomerata (l.) kutz. are the dominant aquatic plants. marginal vegetation is dominated by deciduous trees. the land is used principally for arable farming. the *hiir region is dominated hy a deeply dissected volcano of old red sandstone age. a great mass of granite lying centrally is surrounded by pyroxeneandesites which cover the majority of the volcanic area. the whole igneous complex (outcrop % of catchment area) has been exposed to considerable weathering and is covered by a blanket of glacial drift. the 'middle' coquet, bounded by coarse and massive fell sandstones to the south, flows through a region noted for the cementstone group (outcrop % of catchment). a second escarpment of fell sandstone (combined outcropping % of catchment) forms a harrier between the cheviots and the 'coastal plain'. this is interrupted by faulting allowing the coquet to cut a gorge through to the 'coastal plain'. from here the dip continues to fall eastwards away from the central cheviots and the river crosses a series of outcrops. the most important of these are a helt of middle carboniferous limestone and an outcrop of millstone grit (combined outcropping % of catchment). the whole area has been affected by glaciation and the solid geology of the 'coastal plain' has been largely obscured by thick deposits of drift and morainic material, belonging to the period of ice retreat. the climate of the coquet valley is typical of northeastern england and is, in general, rather cold and comparatively dry. observations on climatic conditions are hased on recordings of the meteorological station at acklington ( " 'n ° 'w) between and . during this period the average air temperature varied from °c in january and february to m-l^c in july. the average annual rainfall was - cm. water temperatures were measured by a series of maximum-minimum thermometers situated at stations - and at subsidiary sites a, b, c and e (fig. ). recordings were taken at monthly intervals between october and april . additional information was provided by two cambridge mercury-in-steel thermographs sited at stations and . continuous temperature recordings were made immediately above the stream bed between march and august . the highest temperature ( - ''q was recorded at station during june . the lowest temperature (- °c) was recorded on several occasions between january and march in both and at all the main river stations. in summer the range in temperature was greatest in the 'hill' region (maximum range - x, in june ). summer temperatures were more conservative in the 'coastal plain' (maximum range - -\%-y'c, in june ). winter temperatures were less variable throughout the length of the river (maximum range - - - °c, in january ). the conservative temperatures (maximum range - - -rc, in novemher ) recorded at site a are characteristic of a spring source. the buffering effect of marginal tree cover on the temperatures of tributary streams is demonstrated by a comparison between the barrow burn, at site b (maximum range - - - °c, recorded in june ) and the river alwin, at site c (maximum range - °c, recorded in june ). detailed information on the temperature regime of the coquet will be published separately. hydrogen-ion concentration. ph determinations were made using a portahle field meter. measurements were taken on a seasonal basis (i.e. during march, july and november of ) from all tributaries at a point immediately prior to their confluences with the main river. measurements were also taken from the main river immediately above and below each confluence. under low water conditions ph values were always high and remarkably constant throughout the river system (range - - - ). however, under flood conditions ph values were reduced (range -o- - ). dissolved oxygen. during , dissolved oxygen determinations were made on a seasonal basis, using a portable field meter. oxygen saturation never fell below the range - % and there was no indication of oxygen depletion, at any time, throughout the entire length of the river. chemicai dtaracteristics. water samples were taken from stations - on a seasonal basis (table ) . all chemical analyses were carried out in accordance with the methods of mackereth ( ) . an e.e.l. flame photometer was used for sodium (na+) and potassium (k+) determinations. calcium (mg/l) magnesium (mg/l) sodium (mg/l) potassium (mg/l) akalinity (mg/l cacoa) alkalinity (mequiv,/l hco -) chloride (mequiv./l) nitrate + sulphate (mequiv./l) the general chemical characteristics (e.g. calcium hardness, range - - - mg/l ca++ and alkalinity range - - - mg/i cacog") are closely related to the geology of the watershed. high values in the lower reaches (stns. and ) are also attributed to the richness of the land in the coastal plain. lower values at station are a result of dilution by the barrow burn which is relatively poor in dissolved salts. more detail on the effects of geology on the chemistry of water bodies in northumberland may be found in sutcliffe ( ) . low nitrate (range - - mg/l no -n) and ammonia (range - - - mg/l nh -n) values, supplied by the former northumbrian river authority, reflect the pure quality of the water throughout the length of the river. long term discharges have been recorded (in cumecs) by the northumbrian river authority in the upper 'hill' region (nt ) between - (lowest daily mean , average - , peak - ) and in the lower 'coastal plain' (nu ) between - (lowest daily mean - , average - , peak - ). mean current velocities, measured using an edington flow meter cm above the surface of the substrate, were variable in space and time at all stations (range cm/s to cm/s). the non-tidal river is clean throughout its length. the only appreciable centre of population ( ) is at rothbury where a sewage treatment plant is operative. the river coquet is an important source of water for potable supply, an average abstraction of some million gallons per day being made from the non-tidal river at warkworth dam. the coquet is one of the most important game fishing rivers in the north of england and has good spawning facilities for migratory fish. records from the northumbrian river authority counting station at warkworth between and show an average yearly run of sea trout {saimo trutta l.) and salmon (saimo salar l.). the coquet is a moderately calcareous river, free from any significant sources of pollution or enrichment. the benthic invertebrate communities are typically 'rhithron' in character and larval ephemeroptera, plecoptera, net-spinning trichoptera and simuliidae are the dominant forms. it provides a useful comparison with some of the less productive rivers of the lake district and other regions of britain. five main sampling stations were established along the main river course, at approximately equal intervals (fig. ) . stations (national grid reference nt , elevation ') and (nt , eln. ') represent the 'hill' region. station (nu , eln. ') the 'middle" region and stations (nz studies on the ephemeroptera ofa northumbrian river system , eln. ') and (nu , eln. ') the 'coastal plain' region. subsidiary collecting sites were established at the main river source, on certain tributaries and other places of special interest (fig. ) . site a (nt , eln. ') at coquet head. the river rises at an altitude of ft ( - m) from a raised spiiagnum bog. the watercourse is a small channel which flows rapidly through a flattish area at the head of the watershed. the substrate consists of small stones and peat. marginal vegetation consists of juncus effusus and nardus stricta grassland. site b (nt , eln. '). the barrow burn rises ft ( - m) above sea level and drains the rough moorland felt sandstone hills on the south side of the watershed. it is about miles ( - km) long and joins the river coquet at alwinton. at site b, the stream is fast flowing, with boulderstrewn cascades and peat-stained pools. vegetation is mostly absent and the watercourse is heavily shaded by alders. site c (nt , eln. '). the river alwin is miles ( - km) long, rises ft ( - m) above sea level and drains the andesite hills on the north side of the coquet valley. it joins the coquet (in close proximity to the barrow burn confluence) above alwinton bridge. at site c, the alwin leaves its steep sided valley and flows rapidly across a flattish area to its confluence with the coquet. the substrate is composed entirely of bare stones and large marginal banks of shingle are deposited by winter spates. site d (nt , eln. '). the hosedon burn rises ft ( - m) above sea level. it drains the south-facing andesite hills of the watershed and joins the river alwin immediately above its confluence with the main river. at site d it is a narrow channel flowing swiftly through a flat juncus bog. eontinalis grows on the more stable areas of a predominantly stony substrate. site e (ny . eln. '). the grasslees burn drains the mooriand fell sandstone area of the south side of the upper watershed and its highest tributaries rise at altitudes of over ft ( - m) at site e it is characterized by a series of peat-stained pools shaded by alders. site f (ny , eln. ') lies on the main river immediately upstream of caistron, a broad shallow area where sand and gravel is removed from the river bed. site g (nu , eln. ') lies on the main river below caistron. conditions at both sites are similar with eroded stony substrates usually associated with rapid current velocities. at site h (nz , eln. '), situated on the 'coastal plain' region of the main river, conditions are similar to those described for stations and s. methods of sampling the bottom fauna of stony streams are reviewed by macan ( ), cummins ( and ulfstrand ( ) . some of these techniques are also reviewed by frost, huni & kershaw ( ) in their critical appraisal of a kicking method, similar to that employed by hynes ( ) . the method employed for the present investigation was a modified combination of those techniques used by macan ( ) and hynes ( ) . samples were obtained by means ofa meshes per cm square-framed net with cm sides. the net was positioned between the operator's feet and an area of substrate ( cm x cm), immediately upstream of the net, was thoroughly excavated to a depth of about cm using a rake. organisms disturbed by this procedure were swept into the net by the current. large stones were removed by hand and washed into the net. a lo-min period was taken to complete a series of twenty such sub-samples. these together constituted one collection which sampled an approximate area of m^. although restricted to shallow water, this method can be operated under adverse and variable conditions and provides comparable results on a relative basis. it is considered that the use of the rake and the denudation of measured plots improves both efliciency and sample accuracy. twenty-three species of ephemeroptera are recorded in the present survey. in contrast to the findings of other workers (e.g. ide, ; macan, ; kamler, ) the largest number of species were found in the upper reaches of the coquet. however, there was no marked successionai replacement of abundant species along the main river course and discontinuous distributions were confined to the scarce species. the most marked discontinuities were found in certain tributaries. the results are expressed as the annual total number of individuals of each species, taken from each station at monthly intervals, between october and september ( table ). the species can be conveniently grouped according to their distribution and abundance. group \-abundant species with continuotts distribtttions. of the three most abundant species baetis rhodani was dominant. it occurred in large numbers at all stations except station . although present in smaller numbers, the nymphs of ,fi. mtitictis followed a similar pattern. the nymphs of . . scambtis (etn.) and b. fttscatus are taxonomically indistinguishable but, as the latter prefers calcareous conditions, macan ( ) considers them to be ecologically distinct. in support of this contention all adult material examined from the coquet proved to be b.fuscatus. it achieved its greatest numbers in the lower coquet at station and there was a significant reduction in numbers at station . rhithrogena spp., the second most abundant taxon, was well represented at all stations. the nymphs of r. semicolorata are indistinguishable from those of r. haarttpi (macan, ) . although all final instar nymphs collected during april and may, which emerged in the laboratory, proved to be r. semicolorata, circumstantial evidence suggested that r. haartipi was also present. thus a group of large robust nymphs (max. length mm), with an emergence period in march, were thought to be r. haartipi (see kimmins, ) . these were distinct from a group of smaller nymphs (max. length mm), with an emergence period during may and june, which proved to be ^ semicolorata.* from the numbers of nymphs remaining in the benthos after the emergence of r. haarupi, it was concluded that r. semicolorata was most abundant in the upper reaches whereas r. haartipi was most abundant further downstream. these observations are in agreement with those of macan ( ) . ephemerella ignita, the remaining dominant species, was present in large numbers at all stations. it did, however, show a sequential increase in numbers proceeding in a downstream direction, being most abundant at low altitudes. the nymphs of centroptilum luteoltim occurred in their largest numbers at station . they were present in comparatively small numbers at the lower stations where they may have been confined to the less • details of seasonal distributions and life histories witi be published separately. accessible areas of deposition. similarly caenis rivulorum was apparently most abundant at station and occurred in relatively small numbers in the lower reaches. three species of the genus ecdyonurus were present in the coquet system and observations on their distributions are based on the numbers of larger nymphs which could be identified with confidence. e. torrentis, the dominant species, was most abundant at station but, contrary to the findings of other workers (e.g. macan ; maitland, ) , it was not confined to tbe small stream channels of the upper valley. similarly e. venosus was common and relatively abundant in the 'hill' region-especially at station . although not excluded from the lowland river it was only present in very small numbers. e. dispar was relatively common and abundant throughout the main river, but achieved its greatest expression in the lower coquet, tending to replace £. venosus at stations and . a marked reduction in the numbers of ecdyonurids at station was similar to that already described for the baetids. group ii. scarce species with discontinuous distributions. paraleptophebia submarginata and habrophiebia fiisca only occurred at stations i and and were most abundant at station . there is, therefore, strong evidence that these species were confined to the 'hiir region. similarly centroptilum penniilatum, prodoeon pseudorufuium and siphionurus lacttstris were restricted to the upper 'hill' region. baetis niger was restricted to the upper and middle reaches and achieved its greatest abundance at station . since the barrow burn enters the coquet immediately above station it is possible that the abundance of b. niger at this point was a consequence of stream drift (see site b). in contrast epiiemerella notata was limited to the 'middle' and 'coastal plain' regions. it was most abundant at station and followed the same general trend in distribution as e. ignita. the most interesting discontinuity in longitudinal distribution and a classic example of successionai replacement was exhibited by the two members of the genus heptagenia. h. lateraiis was the most abundant species and occurred in greatest numbers at station i. it appeared to phase out at station and was excluded from the lower 'coastal plain' region. conversely h. sulphurea was confined to the 'coastal plain' region and attained its greatest numbers at station . as the overlap between these species coincided precisely with the outcrop of carboniferous limestone which characterizes the 'coastal plain*, there is, therefore, evidence to support the view held by macan ( ) that h. lateraiis replaces h. sulphurea in non calcareous areas. one individual of caenis horaria was obtained from station . this is typically a still-water form (macan, ) . sites f and g. the influence of sand and gravel extraction at caistron. the apparent paucity of ephemeroptera nymphs at station , in particular those belonging to the genus baetis, has already been described. topographical factors do not seem sufficient to explain this phenomenon. there is, however, a sandand gravel-extracting company operating at caistron, several miles above station . the adverse effects of the deposition of suspended inorganic sediments on stream faunas, caused by sand and gravel washings etc., have already been noted (e.g. stuart, ; cordone & kelly, ; hamilton, ; chutter, ) . consequently samples were taken at site f (above) and site g (below) the caistron sand and gravel works (table ) . ranges in levels of suspended solids for sites f and g were - mg/l and - mg/l respectively, y} test revealed a very highly significant difference {p = < ) between samples, tbus providing strong evidence that gravel extraction had at least a localized influence on the ephemeroptera of this region. this infiuence did not affect all species to the same extent. thus b. rhodani, b. muticus, rhithrogena a.nd centroptilttmluteolumwerc found to be intolerant (p = < ) but no reduction (p = > ) could be shown for the remaining species. the profile of the main river suggests that settlement was probably minimal at g owing to the steep gradient but solids carried in suspension were more likely to accumulate at station . hence this may be one factor limiting the ephemeroptera populations in the 'middle' region. sites a, b, c, d and e. samples, taken from each subsidiary site on a seasonal basis, revealed certain departures from the trends reported for the main stations (table ). the colleotion taken at site a indicates quite clearly the scarcity of all ephemeroptera nymphs at coquet head. similar results obtained by maitland ( ) show that the majority of species abundant in tbe river endrick were absent from the extreme upper reaches. the barrow burn and tbe river alwin (sites b & c), two important tributaries which fiow into the same region of the coquet, are markedly different in the species composition of their mayfly faunas (similarity %). the barrow burn was characterized by a large diversity of species but only small numbers of individuals. the species composition of the river alwin was less diverse but some species were represented by very large numbers of individuais. baetis rhodani, dominant in the main river, was also the most abundant species in the river alwin. the near exclusion of this ubiquitous species from the barrow burn is therefore of interest. as it haj already been established that b. rhodatii exhibits a positive phototactic response (hughes, ; thorup, ) it might be expected to avoid the heavily shaded areas of the barrow burn. similarly b. mtiticus was comparatively abundant in the river alwin but was replaced by b. niger in the barrow burn. b. fuscatus/scambus was apparently scarce in both tributaries. species present in both tributaries, but most abundant in tbe river alwin, include riiithrogena semicoiorata, ecdyonttrus torrentis, ephemerella ignita and caenis rivulortim. in contrast centroptilum luteolttmwas absent from the alwin but comparatively abundant in the barrow burn. the remaining species paraieptophebia submarginata, habrophlebia fusca, leptophlebia vespertina (l.) and prodoeon pseudoriifuhim, were all scarce and were only taken in samples from the barrow burn. samples taken from the hosedon burn (site d) revealed similarities ( %) with those from the river alwin. the sparsity of the summer population, however, was undoubtedly a reflection on the tendency for this stream to dry up during this period. grasslees burn (site e), in addition to being similar in character to the barrow burn, was similar ( %) in the composition of its niayfiy populations. the spatial distributions of nymphs at site h were the subject of a special investigation which will be described in a separate publication. the following is a summary of information on additional species with sporadic distributions which were recorded during the present survey but not included in the routine sampling programme. epiiemera danica milll. a few individuals of e. danica were obtained from two tributaries, the blackburn and the debden burn. since this species was formerly common and abundant it may still occur elsewhere in the watershed. ameletus inopinatus (etn), macan ( ) states that a. inopinatus. the only arctic-alpine ephemeropteran known to occur in britain, is confined to high aititudes and is common and abundant in becks above ft. the absence of this species from the upper coquet was, therefore, somewhat unexpected. however, its presence in the upper reaches of the river alwin (eln. ft) was revealed in june and it had already been reported in the kidlandlee dean, a tributary of the alwin, by sutcliffe ( ) . longitudinal zonation of benthic invertebrates has been described in earlier studies on running water systems (e.g. dodds & hisaw, ; ide, ; macan, ; maitland, ) . the majority of attempts to classify river systems according to the type of zonation they display are based largely on the species offish present (e.g. huet, huet, , . however, lilies & botosaneanu ( ) , in their combined scheme for the classification of running water bodies, were able to distinguish well-defined topographical zones based on the composition of the bottom fauna communities. of these 'rhithron' and 'potamon' were considered of major importance. on the other hand thorup ( ) , in a critical review of studies of stream zone systems, concluded that it was not possible to construct a satisfactory classification on the basis of bottom fauna communities. owing to the large number of variable factors, it would appear that precisely defined zones are of limited value. however, the existence ofa transition in faunal types from source to mouth is an ecological phenomenon of some importance in most river systems. the absence of any well defined zonation of the abundant species of ephemeroptera n the coquet is, therefore, of great interest. topographically the coquet is 'rhithron' in character throughout its length and at its present stage in development it is a 'young' to 'mature' system. thus there is no extensive flood plain with the associated fauna diagnostic of an 'old' river system (see macan, ; huet, ) . apart from the availability of dissolved substances, the most marked changes that occur with decreasing altitude are increases in both water volume and marginal tree cover. it is suggested that these differences may be responsible for certain discontinuities, described for the scarce species of both the upland and lowland communities, as a function of temperature, they are thus confined to regions of tolerance whereas the ecological requirements of the abundant species are fully satisfied by the range of variable parameters experienced along the river course. in considering seasonal distribution a distinct trend becomes apparent. those species (i.e. e. ignita, e. twtata, e. dispar and b. ftiscatiis) only present as nymphs in the benthos during the summer months, were more abundant in the lower reaches. however, certain of the 'winter' species (i.e. heptagenia iateraiis, ecdyonurus venosus, e. torrentis and possibly rhithrogena semicoiorata) were most abundant in the upper reaches. there is thus a possibility that these limitations are also temperaturedependent. temperature as a factor limiting the distribution of the ephemeroptera has already been observed by other workers (e.g. macan, i , ). an experimental investigation of the effects of temperature on the distributions of certain ephemeroptera nymphs in the coquet will be published separately. finally it was found that local distribution was subject to spatial discontinuities. this will also be the subject of a separate publication. in conclusion it would appear that the absence of change in the topographical features of the river coquet is of primary importance in determining tbe uniformity of its faunal characteristics. this is due principally to the influence of topographical factors on the flow regime and in consequence the nature of tbe substrate. the effects of thermal and chemical relationships are superimposed on this generai scheme. studies on the fauna of a shropshire hill stream studies in stream life in tributaries of the welsh dec records of the phryganeidae (trichoptera) in northern england - , with a summary of the distribution of british species records of mayflies (ephemeroptera) in northumberland survey of the river tees. part . the non-tidal reacheschemical and biological the effects of silt and sand on the invertebrate fauna of streams and rivers the ephemeroptera of the river bela the influences of inorganic sediment on the aquatic life of streams an evaluation of some techniques for the collection and analysis of benthic samples with special emphasis on lotic waters ecological studies on aquatic insects. iv. altitudinal range and zonation of mayflies, stoneflies and caddisflies the effect of water flow on populations of net-spinning habitat preferences in net spinning caddis larvae wilh special reference to the influence of water velocity a biological survey of the river taff the distributional relationship between the bottom fauna and plant detritus in streams the distribution of benthic invertebrates on substrata in fast-flowing streams a survey of the bottom fauna of streams in the scottish highlands. . the relationship of the fauna to chemical and geological conditions evaluation ofa kicking technique for sampling stream bottom fauna the effect of sand-pit washings on a stream fauna an investigation of the distribution of the mayfly fauna of a lancashire stream profiles and biology of western european streams as related to fish management influence due courant sur la distribution des poissons dans les eaux courantes the role of responses to light in the selection and maintenance of microhabitat by the nymphs of two species of mayfly the invertebrate fauna of a welsh mountain stream further studies on the invertebrate fauna of a welsh mountain stream the effect of temperature on the distribution of the mayfly fauna of a stream problemes et m thodes de la classification et de la zonation teologique des eaux courantes, considertes surtout du point de vue faunistique the fauna of the river melindwr. a lead-polluted tributary of the river rheidol in north cardiganshire distribution of plecoptera and ephemeroptera in relation to altitude above mean sea level and current speed in mountain waters a revised key to the adults of the british species of ephemeroptera the distribution of plecoptera and ephemeroptera in a lowland region of the distribution, abundance and lifehistories of stoneflies (plecoptera) and mayflies (ephemeroptera) in a british river, warmed by cooling-water from a power station a survey of the macro-fauna of the river cynon. a polluted tributary of the river taff (south wales) the ephemeroptera of a stony stream methods of sampling the bottom fauna in stony streams the effect of temperature on rithrogena the occurrence of heptagenia lateraiis (ephem.) in streams in the english lake district key to the nymphs of the british species of ephemeroptera some methods of water analyses for lrmnologists quantitative studies on the invertebrate fauna of sandy and stony substrates in the river endrick stttdies on loch lomond. . the fauna of the river endrick a survey of aquatic fly-life on the river tweed an ecological study of the invertebrates of the duddon, an english mountain stream a survey of the bottom fauna of streams in the scottish highlands. part . composition of the fauna a qualitative study of some types of stream-bed a biological survey of the river wharfe. . report on the invertebrate fauna records of caddisflies (trichoptera) in northumberland with notes on their seasonal distribution in plessey woods further records of trichoptera in northumberland the influence of drainage works, levees, dykes, dredging, etc.. on the aquatic environment and stocks notes on the chemistry and fauna of water-bodies in northumberland substrate type and its value as a basis for the deliniination of bottom fauna communities in running water benthic animal communities in lapland streams an ecological study of the invertebrate fauna of a chalk stream near great driflield, yorkshire i would like to express sincere appreciation to drg. n. phiiipson for valuable advice and discussion during the course of this research and to professor c. f. humphries, who kindly read the manuscript. the work was supported by a n.e.r.c. research studentship.my thanks are also due to dr t. t. macan and mr p. h. ward for confirming the identifications of certain nymphal and adult ephemeroptera respectively.i am also grateful to the former northumbrian river authority for additional information and to the weather centre, newcastle upon tyne, for meteorological data. key: cord- - bx ao authors: wu, andong; wang, yi; zeng, cong; huang, xingyu; xu, shan; su, ceyang; wang, min; chen, yu; guo, deyin title: prediction and biochemical analysis of putative cleavage sites of the c-like protease of middle east respiratory syndrome coronavirus date: - - journal: virus res doi: . /j.virusres. . . sha: doc_id: cord_uid: bx ao coronavirus c-like protease ( clpro) is responsible for the cleavage of coronaviral polyprotein a/ ab (pp a/ ab) to produce the mature non-structural proteins (nsps) of nsp – . the nsp of the newly emerging middle east respiratory syndrome coronavirus (mers-cov) was identified as clpro and its canonical cleavage sites (between nsps) were predicted based on sequence alignment, but the cleavability of these cleavage sites remains to be experimentally confirmed and putative non-canonical cleavage sites (inside one nsp) within the pp a/ ab awaits further analysis. here, we proposed a method for predicting coronaviral clpro cleavage sites which balances the prediction accuracy and false positive outcomes. by applying this method to mers-cov, the canonical cleavage sites were readily identified and verified by the biochemical assays. the michaelis constant of the canonical cleavage sites of mers-cov showed that the substrate specificity of mers-cov clpro is relatively conserved. interestingly, nine putative non-canonical cleavage sites were predicted and three of them could be cleaved by mers-cov nsp . these results pave the way for identification and functional characterization of new nsp products of coronaviruses. middle east respiratory syndrome coronavirus (mers-cov) is an enveloped virus carrying a genome of positive-sense rna (+ssrna). it was identified as the pathogen of a new viral respiratory disease outbreak in saudi arabia in june , named as middle east respiratory syndrome (mers). mers-cov emerged ten years after severe acute respiratory syndrome coronavirus (sars-cov) and quickly spread to several countries in middle east and europe (assiri et al., ; tashani et al., ) . soon after the first report, the mers-cov genome was sequenced and its genomic organization has been elucidated . this new coronavirus is classified in the lineage c of beta coronavirus, and is close to bat coronavirus hku and hku (de groot et al., ; lau et al., ) . like other coronaviruses (hussain et al., ; zuniga et al., ) , mers-cov contains a coterminal, nested set of seven subgenomic rnas (sgrnas), enabling translation of at least nine open reading frames (orfs). the -terminal two thirds of mers-cov genome contains a large open reading frame orf ab, which encodes polyprotein a (pp a, amino acids) and polyprotein ab (pp ab, amino acids), the latter being translated via a − ribosomal frameshifting at the end of orf a. these two polyproteins were predicted to be subsequently processed into non-structural proteins (nsps) by nsp , a papain-like protease (plpro), and nsp , a c-like protease ( clpro) (kilianski et al., ; van boheemen et al., ) . protease plays a key role during virus life cycle. it is essential for viral replication by mediating the maturation of viral replicases and thus becomes the target of potential antiviral drugs (thiel et al., ; ziebuhr et al., ) . investigating the cleavage sites of coronavirus proteases and the processing of polyproteins pp a/ ab will benefit to identify the viral proteins and their potential function for viral replication. some cleavage sites have been identified and confirmed by previous studies, including three cleavage sites of plpros of human coronavirus e (hcov e), mouse hepatitis virus (mhv), sars-cov, mers-cov and infectious bronchitis virus (ibv), whose cleavages release the first non-structural proteins (bonilla et al., ; kilianski et al., ; lim and liu, ; ziebuhr et al., ) . the canonical cleavage sites of clpros, the sites between the recognized nsps, have also been characterized, including all sites of mhv, ibv, sars-cov and a fraction of sites of hcov e which release the non-structural proteins from nsp to nsp (deming et al., ; grotzinger et al., ; liu et al., liu et al., , lu et al., ) . for clpro of mers-cov, two cleavage sites releasing nsp to nsp have been identified (kilianski et al., ) . however, other cleavage sites remain to be characterized. furthermore, efforts have been taken to predict these cleavages sites by sequence comparison. gorbalenya et al. ( ) made the first systematical prediction on ibv pp a/ ab according to the substrate specificity of c protease of picornaviruses. however, two of their predicted cleavage sites within nsp of ibv were proved uncleavable (liu et al., ; ng and liu, ) . gao et al. ( ) developed a software (zcurve cov) to predict the nsps as well as gene-encoded orfs of coronaviruses more accurately based on previous studies of clpros cleavage sites of ibv, mhv and hcov e. later on, non-orthogonal decision trees were used to mine the coronavirus protease cleavage data and to improve the sensitivity and accuracy of prediction (yang, ) . however, while these methods focus on the prediction of the canonical cleavage sites and target more and more on prediction accuracy to avoid false positives, potential non-canonical cleavage sites might be neglected. for example, a cleavage site between nsp and of mhv strain a is not predicted by above methods, but proved to be physiologically important since it produces a shorter nsp that can support the growth of mhv carrying a mutation on nsp - cleavage site (deming et al., ) . therefore, the substrate specificities of coronaviruses clpros are complicated. a clpro substrate library of four coronaviruses (hcov-nl , hcov-oc , sars-cov and ibv) containing amino acids × positions variants was constructed by making single amino acid (aa) substitution at each position from p to p , and their cleavage efficiencies were measured and analyzed to find out the most preferred residues at each position (chuck et al., ) . however, the non-canonical cleavage site with less preferred residues of clpro is adopted by coronaviruses (deming et al., ) . thus we speculate that other potential clpro cleavage sites may still exist in coronaviruses. in order to set up a more moderate and balanced criteria for protease cleavage site identification, we compared six scanning conditions with different stringency to systematically predict the clpro cleavage sites on pp a/ ab of five coronaviruses including mers-cov. as a representative, the cleavability of the predicted cleavage sites of mers-cov clpro was analyzed by the recombinant luciferase cleavage assay and the fluorescence resonance energy transfer (fret) assay. the results showed that all canonical cleavage sites of mers-cov pp a/ ab were cleavable in our experiments and three of nine predicted non-canonical cleavage sites appeared to be cleavable. our study points out a new direction regarding the prediction and identification of cleavage sites of proteases and contributes to understanding the mechanism of coronaviral polyprotein processing. the genome sequences of coronaviruses were downloaded from genebank database and the sequences of the clpro cleavage sites were collected from p to p (tables s -s ). the substrate profiles of each coronavirus group and the whole coronavirinae were summarized (table s ) . the coding sequence of mers-cov nsp (nc ) was synthesized chemically by genscript and cloned into vectors pet a and pgex- p- , respectively. the catalytic residue mutation c a was generated by over lapping pcr with mutagenic primers (table s ). all the clones and mutations were confirmed by dna sequencing. the expression vectors were transformed into escherichia coli strain bl (de ). the cells were grown at • c in lysogeny broth (lb) medium with antibiotics and induced with . mm isopropylbd-thiogalactopyranoside (iptg) at • c for h. the cells were harvested and resuspended in lysis buffer ( mm tris-hcl, ph . , mm nacl, mm edta, . % np , . mg/ml lysozyme and mm pmsf) at • c. after incubation for min on ice, mm mgcl and g/ml dnase i (sigma) were added to digest the genomic dna. the supernatant of cell lysate was applied to affinity chromatography column after centrifugation. the recombinant protein with his-tag was bound with nickel-nitrilotriacetic acid (ni-nta) resin (genscript) and washed with buffer a ( mm tris-hcl, ph . , mm nacl), buffer b ( mm tris-hcl, ph . , mm nacl, mm imidazole) and buffer c ( mm tris, ph . , mm nacl, mm imidazole). proteins were eluted with buffer d ( mm tris, ph . , mm nacl, mm imidazole). gst-tagged protein was bound with gst resin (genscript), washed with buffer a and eluted with buffer a supplemented with mm reduced glutathione (gsh). the purified proteins were desalted and concentrated by ultrafiltration using kda amicon ultra . -ml centrifugal filter (millipore). all the cleavage sites (eight residues, ranging from p to p ) were inserted into glo-sensor f linear vector. comparing to the wild type firefly luciferase ( aa), glo-sensor luciferase has short truncations at both termini with c-and n-part reversed, resulting in the new -aa n-and -aa c-terminal region respectively. the inserted sequence and the reversed arrangement of the nand c-terminal regions reduce the luciferase activity dramatically. after the recognition sequence was cut off by nsp , the luciferase recover its activity and luminescence in the presence of luciferase substrate. a back to front recombinant firefly luciferase inserted with different cleavage sites was expressed when the recombinant plasmids were co-incubated with a cell-free protein expression system extracted from wheat germ (promega). after incubation for h at • c, nsp was added into the system and the whole system was incubated at • c for h. then, the reaction system was diluted times and mixed thoroughly with equal volume of luciferase substrate. luciferase luminescence was measured by a luminometer (promega) after incubation for min at room temperature. all the conserved putative recognition sites were designed from p to p , synthesized and modified with a typical shorter wavelength fret pair, n-terminal dabcyl and c-terminal glu-edans by gl biochem (shanghai). the peptides were completely dissolved in dmso and the final concentration of dmso in the reaction system was %. m substrate peptide and . m tagged nsp were mixed in the solution of mm tris, ph . , mm edta, m dtt and incubated at • c for h. to calculate kcat/km, different amounts ( . - m) of substrate peptides were co-incubated with . m nsp . the reaction system was placed in giernor black plate and the fluorescence was detected by a microplate reader (molecular devices) with ex/em (nm/nm) = / . relative fluorescence unit (rfu) was collected every s for h. the initial slope (slope a = rfu/min) was generated from the linear interval of the rising stage. then, a linear equation was generated using the rfu at plateau (rfu max ) vs. the concentration of substrate. the slope (slope b = rfu/[s]) indicates the rfu change at per unit change of [s] . the initial reaction velocity (v = [s]/min) was calculated through dividing slope a by slope b. the michaelis-menten kinetic constants were generated by lineweaver-burk plot. the coronavirus clpros and their cleavage sites are evolutionarily conserved among different genera. to study the genetic diversity and evolution of clpro cleavage sites of coronaviruses pp a/ ab, primary sequences of clpro cleavage sites (ranging from p to p ) of species of coronaviruses were collected and listed in tables s -s , including the predicted and verified cleavage sites. canonical cleavage sites of each coronavirus were joined end to end to produce a spliced sequence which was then used to produce a phylogenetic tree (fig. a ). in addition, the sequences of all coronavirus clpro were used to generate another phylogenetic tree (fig. b) . the analyses showed that the phylogenetic distances and taxonomic positions of each virus, in both phylogenetic trees, were mostly consistent with that classified by the international committee on taxonomy of viruses (ictv) (http://www.ictvonline. org/virustaxonomy.asp). these results implied that the cleavage sites of coronaviral clpros might co-evolve with clpros, and the genetic diversity of both clpro and its cleavage sites are relatively conserved between different genera of coronaviruses. however, on the phylogenetic tree generated with clpro cleavage sites (fig. a) , the members of the genus gammacoronavirus, although clustered closely, is split into alphacoronaviruses and deltacoronaviruses, suggesting that the cleavage sites of gammacoronaviruses may have undergone recombination events during evolution. in order to develop an optimized method for cleavage site prediction that can cover all possible cleavage sites with fewer false positives, we have set three levels of criteria (stringent, moderate and mild) for cleavage site prediction. in the stringent rules, clpro cleavage sites only comprise the most preferred residues at each position based on previous description (chuck et al., ) . in moderate rules, clpro cleavage sites comprise residues which ever appeared in the cleavage sequences of congeneric coronaviruses at each particular position. as for mild rules, the cleavage sites could comprise any residues ever found in the cleavage sequences of all coronaviruses at each particular position. because the substrate preference at p and p is not strong, we decided to adopt two different lengths of cleavage sequences for prediction, one containing six residues from position p to p , and the other containing four residues from position p to p . these two lengths of cleavage sequences, combining with the three different criteria, made up a total of six search conditions for cleavage site predication with decreasing degree of stringency. the canonical cleavage sites of clpro for these seven groups of coronaviruses were summarized in tables s -s and used to set conditions iii to vi. possible residues at each particular position of clpro cleavage sites were predicted based on all six conditions to make the cleavage site profile of coronaviruses clpro (table s ). in principle, when condition i was employed, the least number of possible cleavage sites were identified in a scanned sequence, while condition vi predicted the largest number of possible cleavage sites in a scanned sequence. to the applicability, we applied all the six conditions on five representative coronaviruses, including hcov e from alphacoronavirus, mhv from betacoronavirus lineage a, sars-cov from beta coronavirus lineage b, mers-cov from betacoronavirus lineage c and ibv from gammacoronavirus. all possible cleavage sites predicted based on each condition were scanned on pp a/ ab of five representative coronaviruses and the results were summarized in table . as shown in table , increasing numbers of cleavages sites were found for each coronavirus when conditions from i to vi were applied. the results showed that condition i and ii were too strict to cover all canonical cleavages sites; condition v and vi were too loose so as to produce two to three times more than cleavages sites; condition iii could only cover the canonical cleavage sites for sars cov; only condition iv generates an appropriate number of cleavage sites for all five coronavirus. therefore, search condition iv was chosen for further analysis of the cleavage sites of mers-cov. by applying the search condition iv, putative cleavage sites (pss) as well as canonical cleavage sites (css) were predicted (table ) . although the canonical cleavage sites of mers-cov clpro have been predicted by sequence alignment with other coronavirus , our results suggested that the additional cleavage might occur in the process of mers-cov pp a/ ab processing. to verify the activity of mers-cov clpro and cleavability of the predicted cleavage sites, the biochemical assay systems of mers-cov clpro were established. as shown in fig. a and b, we first expressed and purified mers-cov clpro (nsp ) with different tags and mutation: n-terminally gst-tagged nsp (gnsp , . kda), n-terminally his-tagged ( extra amino acids with × his tag and linker provided by vector pet- a) nsp (hnsp , . kda), hnsp with catalytic residue mutation c a (hnsp m, . kda) (kilianski et al., ) and gst tag-gvlq-nsp with c a mutation and × his tag (gnsp mh, . kda), in which the sequence motif gvlq represents the last four residues of mers-cov nsp , mimicking the cleavage site of mers-cov nsp /nsp . in the biochemical assays, the gnsp mh with catalytic residue mutation c a could not undergo self-cleavage at the cleavage site to release gst in incubation for h (fig. c) , indicating that the clpro activity of mers-cov nsp in gnsp mh was inactivated by the mutation c a. thus, gnsp mh was used as protease substrate in the following biochemical assays. to verify the clpro activity of recombinant nsp s, gnsp and hnsp were incubated with substrate gnsp mh for min to h and analyzed by sds-page (fig. d) and western blotting, respectively (fig. e) . both gnsp and hnsp showed the proteolysis activity to cleave the substrate gnsp mh into two parts: gst ( . kda) and nsp mh ( . kda), which were confirmed by the correlation of their molecular weight (fig. d and e) . however, the clpro activity of gnsp was obviously weaker than that of hnsp , which could entirely cleave the substrate gnsp mh h post treatment ( fig. d and e) . these results could be explained by that the larger fusion tag at the n terminus of mers-cov clpro significantly reduced the proteolysis activity of clpro, which was consistent with the previous observation (xue et al., ) . in the biochemical assays, the the tree was generated by the sequence of nsp and the method is the same as described above. the number of cleavage sites in pp ab of representative coronaviruses predicted by using search conditions. condition iii condition iv condition v condition vi a canonical cleavage sites, which are located between recognized nsps. b putative cleavage sites, which are located inside various nsps. c six search conditions are designed: conditions i, iii and v cover six residues from p to p ; conditions ii, iv and vi cover four residues from p to p . conditions i and ii are set to comprise the most preferred residues at each position; conditions iii and iv comprise residues appeared in the cleavage sites of congeneric coronaviruses; conditions v and vi comprise residues appeared in the cleavage sequences of any coronaviruses. relatively lower proteolysis activity of clpro will benefit to observe the influence of different substrates. therefore, both recombinant gnsp and hnsp were used as mers-cov clpro in the following studies. to rapidly evaluate the proteolysis activity of mers-cov clpro toward the predicted cleavage sites of different substrates, a sensitive luciferase-based biosensor assay was adopted. as shown in fig. a , the canonical cleavage sites (cs) of mers-cov nsp /nsp (cs / ) and nsp /nsp (cs / ), which were experimentally confirmed in a previous study (kilianski et al., ) , were inserted into the inverted and circularly permuted luciferase construct pglo- f, in which the n-terminal and c-terminal halves of luciferase gene are separated. the resulting luciferase in translation system in vitro was inactive and could convert into an active luciferase when cleaved by recombinant viral protease at the engineered cleavage sites (such as cs / and cs / ). in this system, the luciferase signals were detected when incubated with both gnsp and hnsp , respectively (fig. b) . in contrast, the mutated nsp (hnsp m) could not convert the inactive luciferase into active form (fig. b ). this result indicated that the luciferase-based biosensor assay could be used to evaluate the proteolysis activity of mers-cov clpro. then, the other nine canonical cleavage sites and nine putative cleavage sites composed with aa from mers-cov pp a/ ab were inserted into the luciferase construct pglo- f, and the luciferase-based biosensor assays were performed using hnsp and hnsp m, respectively. as shown in fig. c , all the canonical cleavage sites of mers-cov clpro generated luciferase signal by hnsp at least . times higher than by the inactive hnsp m, indicating that all these canonical sites could be cleaved by mers-cov clpro. these results experimentally verified the existence of the predicted canonical cleavage sites. interestingly, among the nine putative cleavage sites, the luciferase signals of ps - , ps - and ps - remarkably increased more than folds when incubated with hnsp , indicating that the putative cleavage sites, located inside nsp and nsp of mers-cov respectively, might be cleavable (fig. d) . the other predicted putative sites (ps - , ps - , ps - , ps - , ps - , and ps - ) showed less than . folds increase of luciferase signal when they were treated by hnsp comparing with those treated by hnsp m (fig. c and d) . due to high sensitivity of the luciferase-based biosensor assay and the fact that the confirmed verification of the recombinant luciferase assays. inactive luciferase was synthesized in the cell-free translation system and the reaction mixture incubated at • c for h. after that, the protein mixture was divided into four parts and incubated with . m gnsp , hnsp , hnsp m or h o, respectively. after incubation for h at • c, the reaction product was diluted times and mixed with equal amount of luciferase substrate. after incubation at room temperature for min, the luciferase luminescence was measured. luciferase activation fold was calculated through dividing the signal value of the reaction system treated with active hnsp by the one treated with the inactive nsp mutant hnsp m. (c) the luciferase cleavage assay of predicted canonical cleavage sites and (d) putative cleavage sites. the luciferase expression vector inserted with cleavage sites were added to the wheat germ protein translation mix and incubated at • c for h, and the reaction mixture was divided and treated with hnsp and hnsp m, respectively. the dashed line indicates the lowest fold increase of luciferase signal by cleavage of previously confirmed clpro cleavage sites. the data presented here are the mean values ± sd derived from three independent experiments. canonical cleavage sites generated at least . times increase of luciferase signal, the cleavage signal of these six sites may represent the background level, indicating that they are likely uncleavable per se. these results suggest that previously unrecognized clpro cleavage sites may exist inside the nsps, which were regarded as non-canonical cleavage sites. the substrate specificity of coronaviruses clpro is determined by the residues from p to p positions of cleavage sites, especially depending on the p , p and p positions, which would benefit the prediction of cleavage site and design the broadspectrum inhibitors of coronaviruses clpro (chuck et al., ; hegyi and ziebuhr, ) . previous studies demonstrated that different canonical cleavage sites of some representative coronaviruses are not equally susceptible to proteolysis by recombinant clpro (fan et al., ; hegyi and ziebuhr, ) . to define the susceptibility of the canonical cleavage sites and substrate specificity of mers-cov clpro, -mer synthetic peptides representing corresponding canonical cleavage sites of mers-cov clpro were synthesized and modified with n-terminal dabcyl and c-terminal glu-edans (fig. a) . the fluorophore edans and quencher dabcyl are widely used in the biochemical assays based on the fluorescence resonance energy transfer (fret). as shown in fig. b , the peptides represented cleavage sites cs / and cs / were tested to optimize the fret assay, and the relative fluorescence unit (rfu) folds of both sites significantly increased when incubated with gnsp and hnsp . although the fret assay system is more costly and less sensitive than the luciferase-based biosensor assay (figs. b and b), it provides continuous read signals during the process of reaction, which could measure the kinetic characteristic of protease toward different substrates. the initial reaction rate (rfu/min) of all canonical cleavage sites of mers-cov were measured and shown in fig. c . the michaelis constants including kcat, km, kcat/km and relative kcat/km were then calculated (table ) . as shown in table , the substrate specificity of mers-cov clpro is relatively conserved with other coronaviruses as previously reported (fan et al., ; hegyi and ziebuhr, ; ziebuhr and siddell, ) . the relative kcat/km values of cs / and cs / indicated that the cleavage sites flanking mers-cov clpro are converted significantly faster than other sites. the efficient proteolysis at the sites flanking nsp implies that the nsp ( clpro) might be released from the polyprotein a/ ab at the very early stage of the maturation of viral nsps, which is similar with the hcov, tgev, sars-cov and mhv (fan et al., ; hegyi and ziebuhr, ) . however, the relative kcat/km value of cs / is lower than that of cs / (table ) , which is different from that of the coronaviruses (fan et al., ; hegyi and ziebuhr, ) . this could be explained by that the residue gly (g) at the p of cleavage site between nsp and nsp of mres-cov reduces the protease activity of clpro comparing with the residues ser (s), ala (a) and thr (t) of other coronaviruses (tables s -s ) as previous described (chuck et al., ) . whether such disparity plays any role in the replication and pathogenesis of mers-cov is unknown. the processing of viral polyprotein by clpro is essential for the replication of coronaviruses. besides the canonical cleavage sites of coronaviruses, some additional cleavage sites inside nsps, so-called non-canonical cleavage sites, have also been identified (deming et al., ) . therefore, more non-canonical clpro cleavage sites are to be identified in different coronaviruses. in this study, we designed six search conditions for predicting clpro cleavage sites, among which, the search condition iv provides a feasible way to reveal the potential cleavage sites of clpro within coronaviruses. based on the genetic diversity of different coronavirus genera (fig. ) , the scanning condition iv adopted the residues of clpro cleavage sites, which ever appeared in the cleavage sequences of congeneric coronaviruses at position p to p . in contrast, conditions i, ii, iii, v and vi were either too restrictive or generated too many false positive outcomes (table ). in the suggested condition iv, residues from position p to p were applied to the prediction of clpro cleavage site. by measuring the relative protease activities of clpro from different coronavirus genera against amino acids × positions of substrate variants, it is shown that the substrate specificity of position p , p and p are significantly lower than other positions (chuck et al., ) . therefore, the consideration of six or more residues is unnecessary, which could lead to leave-out of potential cleavage sites (table ) . comparing with the previous researches on the prediction and identification of clpro cleavage sites, the scanning condition iv showed its advantages. for example, the two nonexistent putative cleavage sites predicted within nsp of ibv (gorbalenya et al., ; liu et al., ; ng and liu, ) were avoided in our prediction method (data not shown). notably, the noncanonical cleavage site at the end of mhv nsp identified by deming et al. could be predicted using scanning condition iv. by using the search condition iv, putative cleavage sites were predicted in mers-cov pp ab in addition to the canonical cleavage sites. the luciferase signal of cs / increased . fold when treated with nsp in the recombinant luciferase cleavage assays, which is the lowest among the canonical cleavage sites (fig. c) . therefore, the . fold increase of luciferase signal was used arbitrarily as a threshold for judging positive and negative. among the nine predicted putative cleavage sites, three sites (ps - , ps - and ps - ) showed obviously increasing signals at least times above the background (fig. d ) and therefore were regarded as cleavable sites. the increase of signals of other six predicted putative cleavage sites was less than . times (fig. d) . therefore, they were regarded as non-cleavable sites and thus as false positives from the prediction. interestingly, the homologous sequence of ps - and ps - are conserved in lineage c of betacoronavirus including mers-cov, batcov hku and batcov hku (fig. a and b) . however, ps - is mers-cov unique sequence (fig. c) . moreover, the cleavability of a cleavage site in biochemical assays is a necessary but not sufficient condition for its physiological existence in the viral infection. a predicted cleavage site may or may not be accessible by a protease. the d structure model of mers-cov adpribose- -monophosphatase (adrp) domain built by comparative protein modeling and papain like protease (plpro) domain (bailey-elkin et al., ) showed that both ps - and ps - are located at the surface of adrp and plpro domain, opposite to the enzymatic active centers ( fig. d and e) , suggesting that these two sites are like approachable by the proteinase. most recently, the crystal structure of mers-cov clpro was determined (needle et al., ) . although ps - is also located at the surface of mers-cov clpro, the self-cleavage of mers-cov nsp was not observed in this study (fig. ) . therefore, the threshold we proposed in the luciferase-based biosensor system to exclude the false positive prediction results is reasonable (fig. d) . however, further studies are needed to identify the predicted cleavage products from the cells infected by mers-cov. currently, such work with live mers-cov is limited in our research facilities due to the biosafety rules, but it can be addressed in collaboration in the future. notably, the outcomes of the two cleavage assay systems were different. the signal fold change of highly sensitive luciferasebased biosensor assay is dependent on the accumulation of active luciferase cleaved by nsp during h (section ), while the outcome of the fret assay is instant relative fluorescence unit (rfu) signal. the rfu/min is the initial speed of the reaction, which reflects but not equals to the efficiency of the cleavage. these differences may be caused by the steric hindrance of the luciferase subunits, the distance between fluorophore and quencher of substrates for fret assay and substrate solubility. therefore, the activity observed in the two different systems cannot be compared directly. based on the characteristic of the two cleavage assay systems, the highly sensitive luciferase-based biosensor assay might be more suitable to high throughput screen the predicted putative cleavage site of protease while the fret assay better for cleavage kinetic analysis. according to the michaelis constants of mers-cov, the substrate specificity of mers-cov clpro is relatively conserved with other coronaviruses (fan et al., ; hegyi and ziebuhr, ) . notably, the pro (p) has been selected as result of evolution at position p of cleavage site between nsp and nsp (cs / ) of lineage c betacoronavirus, which is not preferred by the clpro based on the previous study (chuck et al., ) . however, the relative kcat/km value of mers-cov cs / is . , which is . fold higher than that of sars-cov (fan et al., ) . this indicated that the substrate preferences of some cleavage sites could still be varied among different genera of coronaviruses and the proposed scanning condition iv regarding the residues ever appearing in the cleavage sequences of congeneric coronaviruses is reasonable. in summary, we proposed an optimized search condition for predicting cleavage sites of coronavirus clpro. we verified the canonical cleavage sites of pp ab in biochemical assays. we further identified three non-canonical cleavage sites in the nsps of mers-cov. the results provide clues for possible identification of novel cleavage products of coronavirus nsps and will benefit the studies of the mechanisms of coronavirus replication. processing of polyprotein a/ ab by clpro is essential in coronavirus life cycle. the clpro cleavage site prediction methods established by previous studies are focus on the accuracy, while some noncanonical cleavage sites were missed. in this study, we built a moderate prediction method to balance the accuracy and false positive outcomes. using this method, putative cleavage sites, in addition to the canonical sites, were predicted in mers-cov pp ab and the cleavability of of them was experimentally confirmed. interestingly, all these non-canonical cleavage sites are located upstream to nsp , which is in contrast with previous understanding that the coronavirus cl protease only cleaves from nsp to nsp . this suggests a novel role of clpro in coronavirus pp a/ ab processing. however, the cleavability of these putative cleavage sites needs to be further verified in the viral proteins of mers-cov-infected cells. finally, the catalytic constants of the canonical cleavage sites of mers-cov clpro showed its conservation with the cousins in coronaviridae. hospital outbreak of middle east respiratory syndrome coronavirus crystal structure of the middle east respiratory syndrome coronavirus (mers-cov) papain-like protease bound to ubiquitin facilitates targeted disruption of deubiquitinating activity to demonstrate its role in innate immune suppression characterization of the leader papain-like proteinase of mhv-a : identification of a new in vitro cleavage site profiling of substrate specificities of c-like proteases from group , a, b, and coronaviruses middle east respiratory syndrome coronavirus (mers-cov): announcement of the coronavirus study group processing of open reading frame a replicase proteins nsp to nsp in murine hepatitis virus strain a replication biosynthesis, purification, and substrate specificity of severe acute respiratory syndrome coronavirus c-like proteinase prediction of proteinase cleavage sites in polyproteins of coronaviruses and its applications in analyzing sars-cov genomes coronavirus genome: prediction of putative functional domains in the non-structural polyprotein by comparative amino acid sequence analysis characterization of a -kda polypeptide encoded in gene of the human coronavirus hcv e conservation of substrate specificities among coronavirus main proteases identification of novel subgenomic rnas and noncanonical transcription initiation signals of severe acute respiratory syndrome coronavirus assessing activity and inhibition of middle east respiratory syndrome coronavirus papain-like and c-like proteases using luciferase-based biosensors genetic characterization of betacoronavirus lineage c viruses in bats reveals marked sequence divergence in the spike protein of pipistrellus bat coronavirus hku in japanese pipistrelle: implications for the origin of the novel middle east respiratory syndrome coronavirus characterisation of a papain-like proteinase domain encoded by orf a of the coronavirus ibv and determination of the c-terminal cleavage site of an kda protein a -kilodalton polypeptide encoded by open reading frame (orf) b of the coronavirus infectious bronchitis virus is processed by orf a products proteolytic processing of the coronavirus infectious bronchitis virus a polyprotein: identification of a -kilodalton polypeptide and determination of its cleavage sites identification and characterization of a serine-like proteinase of the murine coronavirus mhv-a structures of the middle east respiratory syndrome coronavirus c-like protease reveal insights into substrate specificity identification of a -kda polypeptide processed from the coronavirus infectious bronchitis virus a polyprotein by the c-like proteinase and determination of its cleavage sites further characterization of the coronavirus infectious bronchitis virus c-like proteinase and determination of a new cleavage site australian hajj pilgrims' knowledge about mers-cov and other respiratory infections mechanisms and enzymes involved in sars coronavirus genome expression genomic characterization of a newly discovered coronavirus associated with acute respiratory distress syndrome in humans production of authentic sars-cov m(pro) with enhanced activity: application as a novel tag-cleavage endopeptidase for protein overproduction mining sars-cov protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection isolation of a novel coronavirus from a man with pneumonia in saudi arabia human coronavirus e papain-like proteases have overlapping specificities but distinct functions in viral replication processing of the human coronavirus e replicase polyproteins by the virus-encoded c-like proteinase: identification of proteolytic products and cleavage sites common to pp a and pp ab virus-encoded proteinases and proteolytic processing in the nidovirales sequence motifs involved in the regulation of discontinuous coronavirus subgenomic rna synthesis supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/ . /j.virusres. . . key: cord- -yfvu r authors: brat, gabriel a.; weber, griffin m.; gehlenborg, nils; avillach, paul; palmer, nathan p.; chiovato, luca; cimino, james; waitman, lemuel r.; omenn, gilbert s.; malovini, alberto; moore, jason h.; beaulieu-jones, brett k.; tibollo, valentina; murphy, shawn n.; yi, sehi l’; keller, mark s.; bellazzi, riccardo; hanauer, david a.; serret-larmande, arnaud; gutierrez-sacristan, alba; holmes, john j.; bell, douglas s.; mandl, kenneth d.; follett, robert w.; klann, jeffrey g.; murad, douglas a.; scudeller, luigia; bucalo, mauro; kirchoff, katie; craig, jean; obeid, jihad; jouhet, vianney; griffier, romain; cossin, sebastien; moal, bertrand; patel, lav p.; bellasi, antonio; prokosch, hans u.; kraska, detlef; sliz, piotr; tan, amelia l. m.; ngiam, kee yuan; zambelli, alberto; mowery, danielle l.; schiver, emily; devkota, batsal; bradford, robert l.; daniar, mohamad; daniel, christel; benoit, vincent; bey, romain; paris, nicolas; serre, patricia; orlova, nina; dubiel, julien; hilka, martin; jannot, anne sophie; breant, stephane; leblanc, judith; griffon, nicolas; burgun, anita; bernaux, melodie; sandrin, arnaud; salamanca, elisa; cormont, sylvie; ganslandt, thomas; gradinger, tobias; champ, julien; boeker, martin; martel, patricia; esteve, loic; gramfort, alexandre; grisel, olivier; leprovost, damien; moreau, thomas; varoquaux, gael; vie, jill-jênn; wassermann, demian; mensch, arthur; caucheteux, charlotte; haverkamp, christian; lemaitre, guillaume; bosari, silvano; krantz, ian d.; south, andrew; cai, tianxi; kohane, isaac s. title: international electronic health record-derived covid- clinical course profiles: the ce consortium date: - - journal: npj digit med doi: . /s - - - sha: doc_id: cord_uid: yfvu r we leveraged the largely untapped resource of electronic health record data to address critical clinical and epidemiological questions about coronavirus disease (covid- ). to do this, we formed an international consortium ( ce) of hospitals across five countries (www.covidclinical.net). contributors utilized the informatics for integrating biology and the bedside (i b ) or observational medical outcomes partnership (omop) platforms to map to a common data model. the group focused on temporal changes in key laboratory test values. harmonized data were analyzed locally and converted to a shared aggregate form for rapid analysis and visualization of regional differences and global commonalities. data covered , covid- cases with , laboratory tests. case counts and laboratory trajectories were concordant with existing literature. laboratory tests at the time of diagnosis showed hospital-level differences equivalent to country-level variation across the consortium partners. despite the limitations of decentralized data generation, we established a framework to capture the trajectory of covid- disease in patients and their response to interventions. the coronavirus disease (covid- ) pandemic has caught the world off guard, reshaping ways of life, the economy, and healthcare delivery all over the globe. the virulence and transmissibility of responsible virus (sars-cov- ) is striking. crucially, there remains a paucity of relevant clinical information to drive response at the clinical and population levels. even in an information technology-dominated era, fundamental measurements to guide public health decision-making remain unclear. knowledge still lags on incidence, prevalence, case-fatality rates, and clinical predictors of disease severity and outcomes. while some of the knowledge gaps relate to the need for further laboratory testing, data that should be widely available in electronic health records (ehrs) have not yet been effectively shared across clinical sites, with public health agencies, or with policy makers. at the time of this writing, more than months after the earliest reports of the disease in china, only . % of us cases reported to the cdc included clinical details . even before therapeutic trials are implemented, frontline clinicians are not yet benefitting from knowledge as basic as understanding the differences in the clinical course between male and female patients . through case studies and series, we have learned that covid- can have multi-organ involvement. a growing literature has identified key markers of cardiac , immune , coagulation , muscle , , hepatic , and renal injury and dysfunction, including extensive evidence of myocarditis and cardiac injury associated with severe disease. laboratory perturbations in lactate dehydrogenase (ldh), c-reactive protein (crp), and procalcitonin have been described. however, data from larger cohorts, linked to outcomes, remain unavailable. because ehrs are not themselves agile analytic platforms, we have been successfully building upon the open source and free i b (for informatics for integrating biology and the bedside) toolkit [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] to manage, compute, and share data extracted from ehrs. in response to covid- , we have organized a global community of researchers, most of whom are or have been members of the i b academic users group, to rapidly set up an ad hoc network that can begin to answer some of the clinical and epidemiological questions around covid- through data harmonization, analytics, and visualizations. the consortium for clinical characterization of covid- by ehr ( ce)-pronounced "foresee"-comprises partner hospitals from five countries. our early efforts aim to consolidate, share, and interpret data about the clinical trajectories of the infection in patients with a first focus on laboratory values and comorbidities. this initial report seeks (a) to establish the accessibility and suitability of data from electronic medical record for covid- patients; (b) to learn about the clinical trajectories of patients; (c) to facilitate evaluation and communication of the utility of various laboratory tests and therapies; and (d) to contribute data, reproducible data mining and visualization workflows, and learnings to a global network and the broader public. here, we report on initial results and the structure of a new, rapidly formed network designed to be a highly scalable system, now implemented at sites. the international scope of our collaboration allows us to identify some of the similarities in clinical course and a few country-specific variations. we recognize that these early data are incomplete and are subject to many biases and limitations, which constrain the conclusions we can currently draw. however, we believe the sources of our data and the mechanism we have established for sharing them are sound, reproducible, and scalable. we also hope our results to-date will encourage other sites to share data and contribute to this important research effort. demographic and consortium-level data over a span of weeks, total hospitals in the us ( ), france ( ), italy ( ), germany ( ), and singapore ( ) contributed data to the consortium. this was represented by data collaboratives across these five countries. a total of , patients with covid- diagnosis were included in the data set, with data covering january , through april , . we collected , laboratory values and harmonized them across sites. thirteen percent of sites submitted complete data sets that included values for each laboratory ( . % for at least , and . % for at least of the laboratory measurements). breakdown of sites is shown in table . demographic breakdown by age and sex is shown in fig. . age distribution was different across countries and consistent with previously identified patterns. in particular, patients from italy were more commonly over the age of relative to other countries . us institutions, despite representing a large number of active infections, had the lowest percentage of elderly patients diagnosed with covid- . germany, with its three included hospitals and relatively small number of patients, was more similar to the us and had an increased number of male patients in the − age group. we were able to capture the total number of identified new cases by site and date. to normalize across sites and countries with varying sizes, we reported -day average new case rate per k over time for each country normalized by the ratio between the inpatient discharge rate for each country and inpatient discharge rate for the ce sites in that country. as shown in fig. , the adjusted -day average new case rates derived from ce consortium sites match reasonably well with those reported by jhu csse for germany, us, and singapore. the ce estimates were substantially higher for france and italy, which could reflect the fact that ce sites in france and italy were mainly concentrated in urban areas with high infection rates. laboratory value trajectories our initial data extraction included laboratory markers of cardiac, renal, hepatic, and immune dysfunction that have been strongly associated with poor outcomes in covid- patients in previous publications. laboratory trajectories of each hospital at the population level are presented online at https://covidclinical. net. given limitations of data harmonization and space, we focused on five laboratory trajectories that represented inflammatory, immune, hepatic, coagulation, and renal function. as shown in fig. , trajectory data were remarkably consistent for most institutions at day (day when biological test was positive) with growing differences with continued hospitalization. extensive data harmonization was performed, but we must emphasize that data from each day represented a potentially different population as patients were discharged, died, or laboratory studies were no longer performed. data values from each hospital were an average of all studied patients a specified number of days after diagnosis. initial laboratory values were abnormal for all patients but were not indicative of organ failure. major abnormal elevations were noted in crp and d-dimer on the day of diagnosis. as the number of days from diagnosis progressed, remaining patients who were not discharged or died had, on average, worse values. for nearly all tests, trends toward progressively abnormal values were consistent with worsening disease as inpatient stays continued. most importantly, the initial values and trajectories were highly consistent with previous findings in studies from china , . creatinine, a measure of renal function and the most commonly performed laboratory test in our data set, was divergent over time across sites. rising creatinine would be consistent with an increased proportion of ill patients with significant acute kidney injury over time. hospitals in italy, in contrast, did not see a dramatic rise in creatinine in their hospitalized population, while the small percentage of french and german patients remaining in the hospital for weeks had clear signs of acute kidney injury. this may represent many underlying differences including a high mortality near the beginning of the hospitalization at italian hospitals, severe right time censoring of remaining patients, or a difference in practice. total bilirubin, a measure of conjugation and function by the liver, was initially normal across most sites and showed increases -consistent with other hepatic laboratory tests-among persistently hospitalized patients. the other hepatic laboratory measurements, alanine aminotransferase (alt) and aspartate aminotransferase (ast), were divergent across institutions and showed a more significant perturbation (see https://covidclinical. net). hepatic impairment was not present in most patients on presentation and total bilirubin was only mildly elevated with continued hospitalization. on average, white blood cell count (wbc), a measure of immune response, was within normal limits on presentation. patients who remained in the hospital and survived had increasing wbcs over time without severe leukocytosis . lymphocyte and neutrophil count trajectories can be seen on the website. procalcitonin and ldh were not commonly tested in the total patient population, but results are also online. c-reactive protein, a measure of systemic inflammation, was notably elevated on presentation for all patients in the cohort with a very narrow confidence interval, consistent with previous findings . although it is of unclear importance, populations of patients who remained in the hospital, survived, and had ongoing laboratory testing showed improvements over time. interestingly, despite a decreasing trajectory during the first week, a mild leukocytosis is observed in counterbalance during the second week. the implication may be that crp is not predictive of ongoing hospitalization or crp is being checked for patient populations where the laboratory is more commonly improving. d-dimer, an acute phase reactant and measure of coagulopathy, was elevated across institutions and countries at presentation. it rose consistently in all populations who continued to be hospitalized with the disease. this was consistent with multiple studies that showed a prothrombotic element to the disease. most importantly, changes were consistent across all sites and highly abnormal. there was a large drop in the number of laboratory tests performed after the first day (see fig. ). drop off in tests performed could be a result of death, length of stay, or change in frequency of data collection by the clinical team. from the maximum number of laboratory tests consistently checked on the first day after diagnosis, there was a rapid tapering in frequency of laboratory tests checked. these changes were particularly pronounced in italy and france. we identified the number of days until the number of tests checked were % of their initial maximum value. values for laboratory study for each day are presented on https://covidclinical.net. results varied for each laboratory value and site. there was no obvious country-level pattern. given that several of these tests, such as creatinine, were commonly checked nearly every day in ill patients, the implication was that patients were censored from the laboratory results because of discharge or death or changing practice pattern. thus, for the purposes of this paper, we focused on trends in creatinine. we normalized the number of tests performed by day to the total performed on day . we then looked at the day when the number of tests performed was % of the maximum number performed for each site. for creatinine, for example, a drop-off in testing occurred between day and across institutions. most patients who survived were likely discharged within this time frame or managed with much less monitoring. further results can be found online. there was greater between-hospital variation for laboratory test performance than between-country variation (see fig. ). at the time of diagnosis, there was significant variation between countries and between the hospitals in a specific country. there was no obvious signature presentation for a country for an individual laboratory value. for example, creatinine was a commonly performed laboratory study within a day of diagnosis. the overall standard deviation (sd) for creatinine values across countries was . while the sd within sites was . . standard deviation for countries was . , . , . , and . within france, germany, italy, and the us, respectively. france was a special case as hospitals were reported together by ap-hp and then compared with three hospitals in bordeaux. this was an important finding that could suggest that laboratory values, as individual results, would not be able to fully explain the mortality differences between countries. a rapid mobilization of a multi-national consortium was able to harmonize and integrate data across five countries and three continents in order to begin to answer questions about comparative care of covid- patients and opportunities for international learning. in just over weeks, the group was able to define a question and data model, perform data extraction and harmonization, evaluate the data, and create a site for public evaluation of site-level data. we aggregated ehr data from hospitals, covering a total of , patients seen in these hospitals for covid- . in doing so, we relied upon prior investments made by various governments and institutions in turning the byproducts of clinical documentation into data useful for a variety of operational and scientific tasks. most importantly, at each site there were biomedical informatics experts who understood both the technical characteristics of the data and their clinical relevance. using automated data extraction methods, we were able to show results consistent with country-level demographic and epidemiological differences identified in the literature. rates of a b fig. percentages of patients along with % confidence intervals in each a country and b sex age groups. g.a. brat et al. total case rise in our study was consistent with international tracking sites . age breakdown, with italian sites reporting a larger proportion of older patients, was also reflective of recent publicly available resources . we were able to show that laboratory trajectories across many hospitals could be collected and were concordant with findings from the literature. in truth, the findings generate more questions than they answer; the ability to see consistencies that spanned many countries indicated that the pathophysiology of this disease is shared across countries, and that demographics and care characteristics will have a significant effect on outcomes. as an example, the fall of crp among those who continued to be hospitalized with a continued rise in d-dimer could suggest that ddimer may be more closely related to persistent illness than crp. the limits of our data collection method, where these results were not tied to the patient level and could not be associated across populations, highlight the need for caution with any conclusion related to changes in laboratory levels over time. perhaps most importantly, our study did not show a unique laboratory signature at the country level at the time of diagnosis. researchers around the world have been closely following the rapid spread of covid- and its high mortality rate in certain countries. one possible explanation would be that patients who presented to hospitals in italy did so at a much more advanced stage of disease. our results did not support this idea. there was as much in-hospital and between-hospital variation as between countries. the average of laboratory values at presentation did not indicate major organ failure. this may be due to a larger proportion of healthier patients than those with advanced disease. of course, respiratory failure could not be tracked within the limits of our data set. there were both logistic and data interoperability lessons that were very important to the success of the project and will be critical for future efforts. logistically, to maximize the timeliness of this consortium's first collaboration around covid- , we deliberately aggregated the data to expedite the institutional review board (irb) process at each institution for such data sharing. this constrained our analyses to count, rather than patient-level, data. while the latter would be optimal for deep analysis and identification of subtle patterns and perturbations of clinical courses, we felt that aggregated count data could provide valuable information on the clinical course even as we sought irb permission for analyses at the patient level. interoperability was a significant barrier to overcome, where large variations in units and data presentation required extensive data harmonization. the use of loinc codes allowed for more rapid data extraction [ ] [ ] [ ] [ ] , but often institutions did not have internal mappings from their laboratory tests to loinc codes. manual interpretation of laboratory value descriptions was sometimes necessary. in future iterations, sites should perform unit conversion and ensure data consistency by presenting reference ranges and example data for a first-pass check of data at the site. variations in icd coding and inclusion made harmonization difficult. frequencies of presenting codes were useful to show similar patterns to previous literature, but the current set of codes was too sparse for any further meaningful analysis. future iterations of this project would encompass a much longer data capture timeline and would ensure comprehensive code collection across all sites. in addition, data alignment by a metric that indicates clinical status is necessary to better establish outcomes. using day of diagnosis as an alignment strategy did not allow for clear identification of causes for temporal patterns. this was, in part, because we could not differentiate between patients who underwent lab testing and were not admitted. although additional lab testing was performed almost exclusively for admitted patients, it is possible that some emergency department patients were triaged and sent home. this would explain the rapid drop-off (and subsequent leveling) seen after day in fig. . future studies will need to explicitly differentiate between categories of patients admitted and triaged to home. these care choices may not reflect similar patient physiology but will more readily track care provision. similarly, outcomes need to be selected that represent clinically meaningful endpoints secondary to this initial data alignment. one reason for this difficulty was that identification of level of care was not easily performed. accordingly, it was not easy to follow patients in and out of icus at the site-level and icu data were not reliable. our group, the consortium for clinical characterization of covid- by ehr ( ce), is one of hundreds of efforts (some of which are listed at healthit.gov) that are working to aggregate and curate data to inform clinicians, scientists, policy makers, and the general public. additionally, networks of healthcare organizations such as the act network and pcornet are working with federal authorities to obtain data-driven population-level insights. similar initiatives are active in the other countries participating in ce, including the german medical informatics initiative . disease-specific and organ-specific covid- research collectives are also assembling, including ones for cancers (https://ccc .org), inflammatory bowel disease (https://covidibd.org), and rheumatology , among many others. the world health organization maintains a directory of worldwide research efforts on covid- including clinical data collection . finally, there are dozens of patient self-reporting apps with hundreds of thousands of users worldwide that provide perspectives on the clinical course of the infection outside hospitals. it is clear that in the midst of a novel pathogen, uncertainty far outstrips knowledge. at this early stage, we are partially blind to the underlying physiology of the disease and its interactions with different health system processes. the rapid collation of laboratory-level data across nearly hospitals in five countries is novel in the questions it helps us ask. we are currently struggling to help public health agencies and hospitals better manage the epidemic. by identifying potential differences in care, with proxies of lab changes over time, numerous questions can be asked about whether certain clinical decisions may be affecting lab trajectories (and ultimately outcomes). as an example, differences in creatinine over time may be a signal of patientlevel physiology or hospital decisions about care. the regional clustering of the trajectories identified is striking and deserves further analysis. could there be choices in diuresis and fluid management that may explain differing trajectories? if so, best practice may need to change to the specific physiology of this disease. we have been treating covid- like previous infections despite its unique physiology; with the right information, our scientific and policy leaders can implement guidelines that improve care. there are a multitude of limitations to this study, not least of which is that it is observational and subject to a variety of biases. perhaps the most severe is that study data are limited to those patients who were seen at or admitted to hospitals, due to severity of illness or other possibly biasing characteristics. aggregate laboratory data have limited ability to identify general trends in the admitted population. changes in the cohort as a result of discharge or death may change the composition of the cohort over time. the time-varying average represents the labs of remaining patients in the hospital; survivors who require ongoing care. this leads to a survivor bias. because there is significant patient drop-out, the remaining population cannot be compared to the initial cohort. our study is only able to identify that patients had similar initial labs suggesting consistent initial physiology. it is not possible to use these values as drivers of outcomes such as death or severe disease. differences in health capacity may also lead to differences in admitted patients that ultimately manifest as worse outcomes across institutions or countries. limitations also include heavy right censoring where patient absence can be due to death or discharge, delays in updating codes or in uploading ehr data to the local analytic data repository. furthermore, potentially confounding interactions between comorbidities, chronic diseases and their treatments and lifestyle or exposures were not taken into consideration. again, because of these limitations we were careful to avoid making more than basic and descriptive conclusions. over the coming weeks, we will work to quantify these biases and adjust for them, if we can. this will include adding data types as well as disaggregating the data to the patient level if and when permitted by irbs. for the present, with the current limited knowledge of the clinical course of patients suffering from covid- , these results add to this small knowledge base. our paper strikingly shows the power of harmonized data extraction from ehrs to rapidly study pandemics like covid- . by example, we hope we can motivate an international discussion on what would be required to enable such international monitoring to simply and rapidly be turned on in future covid- "waves" or in future novel pandemics. we invite others to join the ce consortium by sending a note to ce@i b foundation.org. multiple studies have reported significant abnormalities in several laboratory tests in patients with covid- . studies have shown abnormalities in cardiac, hepatic, renal, immune, and coagulation physiology. those laboratory results are associated with both disease presentation and severity. for this initial study, we selected a subset of laboratory studies that are commonly performed, as identified by the logical objects, identifiers, names and codes (loinc) standard , and had been previously associated with worse outcomes in covid- patients. based on the meta-analysis of lippi and plebani , we focused on laboratory studies that are commonly performed: alt, ast, total bilirubin (tbili), albumin, cardiac troponin (high sensitivity), ldh, d-dimer, white blood cell count (wbc), lymphocyte count, neutrophil count, procalcitonin, and prothrombin time. loinc codes were identified for each laboratory study as well as the units and reference ranges. all patients who received a polymerase chain reaction (pcr)-confirmed diagnosis of covid- were included in the data collection. some hospitals only included patients who were admitted to the hospital while others included all patients for whom the test was positive. sites obtained the data for their files in several ways. most sites leveraged the open source i b software platform already installed at their institution , which supports query and analysis of clinical and genomics data. more than organizations worldwide use i b for a variety of purposes, including identifying patients for clinical trials, drug safety monitoring, and epidemiology research. most ce sites with i b used database scripts to directly query their i b repository to calculate counts needed for data files. institutions without i b used their own clinical data warehouse solutions and querying tools to create the files. in some cases, a hybrid method was used that leveraged different data warehouse platforms to fill in i b gaps. for example, assistance publique-hôpitaux de paris (aphp), the largest hospital system in europe, aggregates all ehr data from hospitals in paris and its surroundings. aphp exported data from the observational medical outcomes partnership (omop) common data model for transformation to the shared format. each site generated four data tables, saved as comma-separated values (csv) files. to protect patient privacy, the files we report contain only aggregate counts (no data on individual patients). in order to further protect patient identity, small counts were obfuscated (see below), since an aggregate count of " " represents an individual patient. by computing these values locally and only sharing the aggregate data, sites were able to obtain institutional approval more rapidly. the first file, dailycounts.csv, contained one row per calendar date. each row included the date, the number of new covid- patients, the number of covid- patients in an intensive care unit (icu), and the number of new deaths from covid- . the third file, labs.csv, described the daily trajectories of select laboratory tests. each row corresponded to a laboratory test (identified using a loinc code) and the number of days since a patient had a positive covid- test, ranging from − ( week before the test result) to (the day of the test result) to n (the day the file was created). the values in each row were the number of patients who had a test result on that day and the mean and standard deviation of the test results. the fourth file, diagnoses.csv, listed all the diagnoses recorded in the ehr for covid- patients, starting from week before their positive covid- test to the present, with the count of the number of patients with the corresponding icd- or icd- code. sites optionally obfuscated the values in any of these files by replacing small counts with "− ." sites indicated missing data or data that they were unable to obtain (e.g. whether patients were in an icu) with "− ." sites uploaded their files to a private shared folder. these files were merged into four combined files that included totals from individual sites. each value in the combined file had four components: ( ) number of sites with unmasked values; ( ) sum of those values; ( ) number of sites with obfuscated values; and ( ) sum of the obfuscation thresholds for those sites. for example, if five sites reported values , , − (between and patients), − (between and patients), − (between and patients), then the combined file listed two unmasked sites with a total of patients and three masked sites with up to + + = patients. from this, it was inferred that there were between and patients. given the large geographic distance between our sites, we assumed that each covid- patient was only represented in one ehr. the combined labs.csv file contained a weighted average (rather than the sum) of the unmasked mean test results from each site. diagnosis codes were submitted from the sites as either international clinical diagnosis (icd)- or icd- billing codes. icd- diagnosis codes were mapped to icd- by first attempting to match the icd- codes to child concepts of icd- codes in the accrual to clinical trials (act) icd- → icd- ontology . in the cases where no match was found in the act ontology, icd- codes were matched to the icd- codes that shared a common concept unique identifier (cui) in the build of the us national library of medicine's (nlm's) unified medical language system (umls) . we created a website hosted at https://covidclinical.net to provide interactive visualizations of our data sets as well as direct access to all shareable data collected for this publication. data aggregation and publication processes are shown in fig. . visualizations were implemented using python and altair (http://altair-viz.github.io/) in jupyter notebooks (https://jupyter.org), all of which are freely available on the website. the vega visualizations (http://vega.github.io) generated by altair were embedded into a jekyll-based site (http://jekyllrb.com/) that was hosted on amazon web services. this study was determined to be exempt as secondary research by the partner's healthcare, boston children's hospital and beth israel deaconess medical center. the committee collected certifications of proper institutional review board prior to data sharing for each additional member of the consortium. as data were transmitted in aggregate, no patient-level data were available from any site. further information on experimental design is available in the nature research reporting summary linked to this article. preliminary estimates of the prevalence of selected underlying health conditions among patients with coronavirus disease -united states does covid- hit women and men differently? u.s. isn't keeping track. the new york times association of coronavirus disease (covid- ) with myocardial injury and mortality why the immune system fails to mount an adaptive immune response to a covid- infection the versatile heparin in covid- rhabdomyolysis as potential late complication associated with covid- liver injury in covid- : management and challenges identification of a potential mechanism of acute kidney injury during the covid- outbreak: a study based on single-cell transcriptome analysis procalcitonin in patients with severe coronavirus disease (covid- ): a meta-analysis serving the enterprise and beyond with informatics for integrating biology and the bedside (i b ) rcupcake: an r package for querying and analyzing biomedical data through the bd k pic-sure restful api the genomics research and innovation network: creating an interoperable, federated, genomics learning system scalable collaborative infrastructure for a learning healthcare system (scilhs): architecture shrine: enabling nationally scalable multi-site disease studies overview of data collection and analysis the shared health research information network (shrine): a prototype federated query tool for clinical data repositories accrual to clinical trials (act): a clinical and translational science award consortium network a translational engine at the national scale: informatics for integrating biology and the bedside case-fatality rate and characteristics of patients dying in relation to covid- in italy an interactive web-based dashboard to track covid- in real time prediction models for diagnosis and prognosis of covid- infection: systematic review and critical appraisal laboratory abnormalities in patients with covid- infection impact of selective mapping strategies on automated laboratory result notification to public health authorities learning from the crowd in terminology mapping: the loinc experience standardizing laboratory data by mapping to loinc evaluating congruence between laboratory loinc value sets for quality measures, public health reporting, and mapping common tests early vision for the ctsa program trial innovation network: a perspective from the national center for advancing translational sciences launching pcornet, a national patient-centered clinical research network german medical informatics initiative the covid- global rheumatology alliance: collecting data in a pandemic recent developments in clinical terminologies-snomed ct, loinc, and rxnorm i b : informatics for integrating biology & the bedside ctsa act network i b and shrine ontology with - shrine adapter mapping file (github the unified medical language system (umls): integrating biomedical terminology all authors approved the manuscript. a table including full contributions is listed in supplementary table data files for daily counts, demographics, diagnosis, and labs data sets are available at https://covidclinical.net. supplementary information is available for this paper at https://doi.org/ . / s - - - .correspondence and requests for materials should be addressed to t.c. or i.s.k.reprints and permission information is available at http://www.nature.com/ reprintspublisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.open access this article is licensed under a creative commons attribution . international license, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons license, and indicate if changes were made. the images or other third party material in this article are included in the article's creative commons license, unless indicated otherwise in a credit line to the material. if material is not included in the article's creative commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. to view a copy of this license, visit http://creativecommons. org/licenses/by/ . /. key: cord- - wihqs i authors: parvin, farhana; ali, sk ajim; hashmi, s. najmul islam; khatoon, aaisha title: accessibility and site suitability for healthcare services using gis-based hybrid decision-making approach: a study in murshidabad, india date: - - journal: spat doi: . /s - - - sha: doc_id: cord_uid: wihqs i healthcare accessibility and site suitability analysis is an elongated and complex task that requires evaluation of different decision factors. the main objective of the present study was to develop a hybrid decision-making approach with geographic information systems to integrate spatial and non-spatial data to form a weighted result. this study involved three-tier analyses for assessing accessibility and selecting suitable sites for healthcare facilities, and analysing shortest-path network. the first tier of analysis stressed the spatial distance, density and proximity from existing healthcare to find more deprived and inaccessible areas in term of healthcare facilities. the result revealed that spatial discrepancy exists in the study area in term of access to healthcare facilities and for achieving equal healthcare access, it is essential to propose new plans. thus, require finding suitable sites for put forward new healthcare service, which was highlighted in the second tier of analysis based on land use land cover, distancing to road and rail, proximity to residential areas, and weighted overlay of accessibility as decision factors. finally, in the third tier of analysis, the most suitable site among the proposed healthcare was identified using the technique for order of preference by similarity to ideal solution. the road network analysis was also performed in this study to determine the shortest and fastest route from these healthcare facilities to connect with district medical hospital. the present study found some suitable sites throughout the district on inaccessible zones where people are deprived from better healthcare facilities. this attempt will highly helpful for preparing a spatial decision support system which assists the health authorities regarding the healthcare services in inaccessible, underprivileged, and rural areas. electronic supplementary material: the online version of this article ( . /s - - - ) contains supplementary material, which is available to authorized users. the terminology of accessibility possesses the multi-layered and multi-faceted concepts which ascertain the quality of admitting approaches which provide a range of support services. potential accessibility ensures the optimum access to comprehensive and quality healthcare to every single individual of a population within a short of healthcare service providers [ ] . adequate accessibility to healthcare service is one of the vital elements for holding an advanced society status. thus, it holds a position in the th global targets set by the united nations for promoting sustainable development goals [ ] . who under the human right concept describe accessibility as availability of health services within a safe and reasonable physical reach to all section of the population especially vulnerable and marginal groups likely ethnic minorities and indigenous people, women, children, aged groups and persons with disabilities including in rural areas [ ] . a united states president's commission in sort to explain the meaning and electronic supplementary material the online version of this article (https://doi.org/ . /s - - - ) contains supplementary material, which is available to authorized users. conceptual problem related to accessibility that equitable access to healthcare should be in a manner that every single citizen can acquire an adequate level of medical care without excessive burdens [ ] . another commission came with a solution regarding the conceptual problems related to access and gave a comprehensible answer that the concept of accessibility is as important as the usage of service for health outcomes. their definition relies on both the concept of timely use of healthcare services and on the best possible use of health outcome [ ] . in a society, expenditure on healthcare is considered as the best social investment as healthy society is one of the fundamental aspects of development to promote social well-being and to minimize health disaster risk in many developing countries [ , ] . in india, despite economic advancement, inequality in healthcare is one of the primary challenges to meet development goal [ ] . several committees are set up to recommend public health policy aims to provide high-quality equal healthcare service to all likely bhore committee , ministry of health and family welfare , national urban health mission [ ] . despite all these efforts, a recent health survey reveals that only about % of indian population is served in government hospitals and about one-fourth income of each household has been invested in healthcare services [ ] . many studies attempt various techniques to get a clear understanding of the accessibility to healthcare centre and to delineate the deprived regions of these healthcare facilities [ ] . geospatial techniques are widely used in different field of studies related to healthcare for maximizing the geographical accessibility to medical services [ ] . gis is a platform which provides a framework in relation to the population for both assessments of the distribution of healthcare centres and evaluation of effective coverage [ ] . spatial or geographical accessibility generally refers to the physical access of a user to a provider's location [ ] or simply reflect the linkages between the point of supply and point of demand by taking consideration of existing transport framework and travel impedance [ ] . accessibility is a multidimensional concept which inherent both spatial dimension such as availability, accessibility and non-spatial dimensions like affordability, acceptability and accommodation [ , ] . khan [ ] highlighted the vehicular travel time/distance and euclidean distance for characterising spatial attributes to measure spatial accessibility. spatial separation based model is a suitable approach when there is incomplete and lacking transport network data and give desirable result for accessibility operation by using the physical distance between infrastructures as it only uses the location of services of interest [ ] . cumulative opportunity or isochrones approach is an effective method as it considers the elements of travel time as well as maximum desirable travel time by capturing land use pattern and infrastructural barrier across the land cover. gravity model is another model for operating accessibility provides the size of the zone of interest, the configuration of the zone, choice of attractor variables and the values of travel impedance time. besides these approaches, the two-step floating catchment area was ( sfca) proposed by luo and wang for calculating spatial accessibility [ ] . in this method, a floating catchment area is selected as a window to measure serviceto-population ratio for each healthcare service. after that, the entire ratio is summed up for each point of location within the catchment area and use it as an accessibility index of that location. enhanced two steps floating catchment area method come to overcome the rigidity and arbitrariness of the sfca i.e. it does not take into account the distance decay method, it sets catchment and subzones of healthcare by considering travel time or travel distance based on the road network is often used as the spatial barrier or impedance [ ] . three steps floating catchment area method used by rekha et al. [ ] to calculate accessibility by considering three attributes namely attractiveness of health services, travel time and distance between the location of the service centre and the location of residents and population demand for healthcare facilities. these all techniques have its advantages in term of application but the result will be incredible if geographic information system (gis) integrated for the spatial result. gis application in healthcare accessibility measurement is exclusively popular for two decades [ ] . gis is one of the sophisticated spatial analyst techniques that not only potential to identify demand flexible points based on residential clustering but also pinpoint spatial inequalities in healthcare delivery points and provide suitable locations for new health facilities [ ] . the geographical dimension of access can be authentically expressed by gis and appreciate the fact that gis technique has potential to fit in a wide range of aspects such as identification of vulnerable population who are devoid of service reach, delimit the points of quality service and treatment without looking at the loopholes in previous qualitative research [ ] . several studies found that gis technique is very useful in the demarcation of nearest healthcare centre for different road network by using patient's postal/zip code and converted then into grid reference to find straight line or travel time distance [ , ] . geographical information system applied to measure the relative importance of distance on providers to gain treatment for depression and also tried to understand the barriers regarding the adaptation of such medical facilities in rural as well as urban areas [ ] . distance and travel time are the most important factor for serving people because the number of death increases with increasing travel time to a hospital in a region [ ] . gis offers a distance tool to estimate travel time between healthcare and residential premises and also deals with the shortest and fastest path analysis to reach nearby healthcare in a short time interval. looking towards such advantages of geographic information system and geospatial analysis, the present study also emphasized and applied a hybrid decision-making approach with the support of gis. the present study aimed to utilize spatial tools to integrate different spatial and aspatial information for spatial analysis of healthcare accessibility and inaccessibility which support to propose new health infrastructures in inaccessible areas in murshidabad district of west bengal, india. such type of geospatial analysis for healthcare accessibility would be applied for identifying suitable sites and allocating new service areas, determining most ideal sites where allocation requirement has essential, and estimating the shortest and fastest distance between nodes of healthcare facilities. the study also offers assistance to health authorities to understand spatial pattern and distribution of healthcare availabilities and facilities for better service to inaccessible and deprived areas. murshidabad district (west bengal, india) is one of the classical provinces with rich historical background and also an important unit of the state, as it shares the largest international boundary with bangladesh in its eastern part. it is the northernmost district of presidency division of west bengal geographically lying between n and n and e and e (fig. ) . the district hq is behrampore. the district is boarded by burdwan and nadia district in the south and birbhum district and jharkhand state in the west and malda district in the north. river bhagirathi divides the entire district almost two equal parts, popularly known as 'radh' on the western side and 'baghri' on the western side. the total area of the district is km and holds the th largest position among the districts of the states in terms of land area. according to the census, murshidabad district is home of around , , people which is roughly equal to the total population of bulgaria and united states of washington [ ] . population density of the district is /km and the decadal growth rate is . %. the district has a large concentration of minority population which accounts % of the total population of the state where the majority are muslims; the sex ratio is per males and literacy is . % with a male and female literacy rate of . and . respectively. this district is a rural unit and a large number of the total workers accounts for . % of the total population engaged in other works such as labourer [ ] , followed by agricultural labourers i.e. . % and household industry workers contributing to . %. the above statistics on literacy, male-female ratio of literacy, and worker pattern indicate that this region is not well developed in term education and occupation. as far as the healthcare facilities are concerned, the district depicts a deprived health service profile where only one medical college and hospital is situated with a bed occupancy rate of . . there is only three super-specialist hospital and four sub-division hospitals are present with a bed occupancy rate of . and . respectively [ ] . rural hospital and public healthcare are also available but the service qualities are not well. moreover, the accessibility of these healthcare facilities is also a vital concern and need to bring focus on proper access to present institutions and also the elimination of disparity region. accessibility has a shorthand terminology with longhand sets of assignments for potential utilization of healthcare services which estimates the degree to which all individual of a population can reach needed services present within the defined distance or driving time. availability, affordability is another important component of healthcare utilization. optimal exercise of healthcare facilities assessment is not an easy task and can be done by estimating location-based accessibility and individual-based accessibility. spatial distribution of public health services illustrates a healthcare profile of any space. murshidabad district has an average portrait of medical services where availability of health institutions is not quite unacceptable but accessibility in terms of a positive outcome is not adequate. a positive outcome can be assessed through efficient treatment, adequacy of a speciality hospital, proper diagnostic and treatment skills of the provider. lack of this positive outcome makes an accessible profile of the district poor. murshidabad district come under the medium-to-low accessible zone for health service if only availability of medical institution will be the criteria but spatial location (distance, travel impedance, travel cost etc.) of the healthcare centres also a vital element for driving accessibility. the district has many government hospitals and public nursing homes under different categories such as one medical college and hospital, super specialist hospitals, sub-divisional hospitals, rural hospitals, and nursing homes. murshidabad district has blocks under subdivisions where the spatial distribution of healthcare centres is not advantageous as there is a trend of clustering can be found. maximum clustering of medical centres is found in certain blocks only such as behrampore, murshidabad-jiaganj, domkal, raghunathganj-i (table ) . behrampore block is the highest accessible area of the district where a cluster of many medical institutions are placed including medical college and hospital, public healthcare centres. suti-i, raghunathganj-ii, sagardighi, bhagwangola-ii, raninagar-ii, jalangi and bharatput-i blocks are the most inaccessible units of the district where a number of medical institution are very limited and rest of the block has moderate healthcare accessibility in terms of total medical institution establishment [ ] . healthcare accessibility is also illustrated by the efficiency of services which depends on the availability of the doctor. doctor-patient ratio is one of the most important mediating factors for individual-based accessibility. overburden of patients can lead to inappropriate treatment regimens due to inadequate doctor availability. out of blocks, only blocks possess good accessibility such as behrampore, raghunathganj-i, murshidabad jiaganj, domkal, kandi, beldanga-i blocks contains a good number of doctors among them behrampore has the highest number of doctors i.e. . remaining blocks have an inadequate number of doctors and possess low accessibility characteristics [ ] . a number of patients are another accelerating factor for visualization of healthcare status. behrampore block is still in the highest position in terms of a number of patients comes for treatment. a large number of patient in any healthcare centre can depict bilateral assumptions as one can be the better medical service attract the most of the patient and other can be the lack of sufficient medical institution can lead to high occupancy and burden rate. in the case of murshidabad district, the second one can be the accelerating factor. murshidabad-jiaganj, rghunathganj-i, kandi, nawda have experience high accumulation of patients, while raninagar-i, domkal, hariharpara, beldanga-i, nabagram have a medium rate of patients crowd. rest of the blocks such as farakka, samserganj, suti-i and ii, raghunathganj-i and so on have comparatively less number of patients as most of the patient use cross-border medical facility due distance from district medical college and hospital. although potential accessibility of healthcare service can not only be measured by single criteria or indicator as all the indicators are interdependent to each other to accelerate higher accessibility. an adequate number of the medical institution with sufficient appointment of doctors available to fulfil patients medical needs can promote higher healthcare status of any unit, if one indicator overburdens the other, the balanced will disturb and can be a dynamic force for inaccessibility. for this purpose, a relative accessibility index (rai) is a crucial factor for assessing the rate of accessibility of any region. murshidabad district has possession of low relative accessibility index. only a few pockets have high relative accessibility index such as behrampore (highest), murshidabad-jiaganj, and domkal, while rest of the blocks have low to very low status in terms of rai (table ). access to healthcare is a multi-fold concept and it mainly corresponds with many dimensions like availability, affordability, acceptability and geographical accessibility of the services. geographical accessibility and availability of healthcare services contain spatial dimension which can be analysed by geographical information system while the rest two i.e. affordability and acceptability are non-spatial dimensions. many types of research have been done to achieve higher accessibility to healthcare service mainly under the four notions: distance from the system works under two concepts i.e. nearest service centre to the population and average distance to the set of service centres, the threshold of the service, gravitational models for providers. distance is the major element to calculate higher accessibility to the service. sometimes it can be measured trough nearest location to the population, travel cost, travel time. nearest the location of a service centre lesser the travel time and will minimize the travel cost, ultimately provide higher accessibility. travel cost and the travel time will also be considered as an important element to measure the non-spatial dimension. geographical information system (gis) studies commonly used euclidean distance method to calculate nearest the location of a service centre i.e. distance from a population centre [ ] . this method faces drawback to provide suitable site mainly in urban areas where the population enjoys sets of service options within a certain point or point of reference. thus, average travel impedance to the service will be suited to ensure spatial accessibility as it incorporates both the entities; accessibility and availability to the population. the threshold of the service can be calculated by several patients per medical institutions. it also refers to the supply options which incorporates doctor-patient ratio, number of bed per person. this method can be shown by density analysis (point, line, kernel density) in gis researches. this method also has some demerits as it does not consider the cross-border population demand to reference supply point. the gravity model is an updated version on newton's law of gravity to analyse the spatial accessibility. this model overcomes the problems associated with former dimensions. it incorporates accessibility and availability of services within both rural and urban settings. it also helps to set the potential supply options to a certain set of potential population point. this model is thus: but the main problem with this is the distance decay coefficient b, which is usually not known and expressed in term of linear or exponential. two-step floating catchment area model: floating catchment area model was primarily employed by peng to analyse urban employment accessibility and further luo and wang also used this method to overcome problem arise for the gravity-based model to calculate spatial accessibility like quality and spatial resolution [ ] . in this method, a catchment area selection is required based on distance and travel time and a spatial accessibility value is appointed for each population point by adding up to the service-population ratio of all the catchments overlay on the point. following is the equation for measuring two-step catchment areas: for population point: where p k is the population size at point k, h j is the doctorpatient ratio, s j is the capacity of a particular healthcare centre, d o is the minimum travel time. but this method is imitated on selected catchment areas and not suitable for all areas. spatial accessibility value at the centre and periphery is higher and becomes zero just over the line. an intervention like flexibility in travel time can make variations in estimated spatial accessibility value. thus, a more improved method is needed to overcome this problem. enhanced two-step floating catchment area model: in this model, the whole catchment area is divided into several subzones with a distinct weight for accessibility, instead of fixing particular binary accessibility for the entire catchment area. following is the equation for computing this method: where p j is the doctor-patient ratio, s j is the capacity of the healthcare centre measured by bed availability or the number of doctors and w r is the accessibility weight appointed for each subzone depended on the distance d between the facility and population and also depend on intervention coefficient. this method also has certain limitation such as it does not take into account the fact that competition among the healthcare centres situated in a single catchment area. thus, some modification has to require and need to develop a more improved version. three-step floating catchment area model: this model is the extension of the previous method. the catchment area is devoted to each healthcare facility as well as population. it takes into account the travel distance and travel time for computing accessibility weight for each catchment (healthcare site) area. to overcome the problem of assigning equal accessibility value to each medical institution, a comparison weight is assigned to each healthcare service based on travel distance and time t d . where w ij and w ik are gaussian weights for service site j and population site k. this selection weight was further taken up in the subsequent formulas used to compute the doctor-patient ratios and the accessibility score as: for each service site: for each habitation: where s j is the capacity of the health care facility j, p j is the threshold of the service computed for each health care facility, p k is the population of habitation k and w r is the weight computed for each habitation and health care facility. but this technique also has some drawback and not satisfactorily accepted because only aspatial data have considered here but the availability of spatial data i.e. no. of health institutions in a particular area, the distance among them, density of health care in a particular area are not considered. thus, a new and hybrid approach is required through which spatial and aspatial information can be analysed, assessed and evaluated for a whole geographical area. to overcome all these problems present study tried to prepare a hybrid decision-making approach for higher healthcare accessibility assessment. in recent times, the geographical information system (gis) is used to prepare a hybrid model in which all aspects can be covered and merged for analysis. the present study has been carried through several steps to precede the decision-making approach. firstly, relative accessibility index (rai) has been estimated based on the existing situation and available data on no. of the medical institution, number of patients and number of doctors i.e. doctor-patient ratio. the rai of health care facilities is the indication of the areal proportion of health facilities which help in analysing accessibility and inaccessibility. for the same, euclidean distance, kernel density and proximate had been analysed to support suitability analysis of healthcare sites using the weighted linear combination. parallelly, topsis was applied to ascertain the most suitable sites where new health care facilities could be built to reach maximum people to provide better health service. finally, the shortest path network analysis was measured to connect and interlinked between existing and proposed healthcare. initially, to start the first hierarchy of decision making approach the relative accessibility index was calculated. it is obtained using the following equation: where mi is the no. of the medical institution, pi is the no. of patients, dj is the availability of doctors and h is the constant equal to . euclidean distance was measured among each healthcare presently existed. it is a measure of the true straight line distance estimation between two points 'x' and 'y' in euclidean space or along the 'x' and 'y' axis. it can be described by is putting pythagora's theorem in one dimensional and two-dimensional spaces where there is one variable describing each cell and can be expressed as: but in 'n' dimensional space or real-world scenario where each cell will have value 'x' for each variable, pythagoras's theorem is difficult to work, thus it can be overcome by measuring the distance between points which is stated as: where d xy is the distance between the points x and y which is equal to the sum from the first variable (p = ) to the last variable (n), of the squares of the distance from each dimension. density was measured to show the spatial availability of healthcare in the study area. it is one of the spatial analyst tools in gis environment for making density analysis of features in a neighbourhood around those features. it is a non-parametric technique generally used to visualize and analyse spatial data for mapping and estimating spatial pattern or event. kernel density calculates both point and line features around each output raster cell which is calculated by considering the total number of the intersection of the individual features. here the raster is calculated by the quadratic formula given by silverman where the highest value is placed at the centre of surface features and pointing towards zero with distance at the search radius [ ] . it can be calculated by using the equation: where 'k' is the kernel function with density f(y), 'y' is the sampled data, 'n' is the number of sample and's' is the smoothing parameter or say bandwidth. proximity tool was used to discover spatial association of features. with this tool output information is gained through the buffer and multiple ring buffers which create an areal feature at a specific distance around the input features. multiple buffers were linear for a small number of the foreclosed unit within various distance bands around a given point. present work presented a simple and efficient decisionmaking approach based on a structural and integrated method to deal with the decision-making problem. a novel group of hybrid decision-making framework has been built for evaluating accessible and inaccessible zone to healthcare services by integrating different raster layers i.e. euclidean distance, kernel density, proximity to support the weighted linear combination. weighted linear combination method is a multi-parametric decision model and comes under one of the fundamental classes of multi-criteria evaluation method in gis which follows the compensatory combination rules [ ] . this method has been widely used in other studies also, like in land-use suitability analysis [ ] , in suitability analysis for soil erosion [ ] , diseases susceptibility [ ] [ ] [ ] etc. present work accepted this technique to identify accessible and inaccessible areas for health care by applying equal weight. all selected raster layers were reclassified with equal cell size to combine them into a single accessibility layer. a weighted linear combination is defined as: where wi is the weight value of deciding factor i, pi is the selected raster input and n is the number of selected decision criteria. suitable sites for proposing and allocating new healthcare was determined using suitability analysis. site suitability is a process of allocation of new and ideal sites by analysing exiting site structure, pattern and condition based on several appropriate criteria. for this determination, the present study had prepared a suitable model for understanding appropriate location by integrating different thematic layers like road accessibility, railway network, land use land cover, and residential density. all these maps have been converted into raster so that each pixel can detect a score. it is important in suitability analysis to set a score for each category at - or - point scale as per their suitability. thematic maps were combined into composite suitability. for suitability analysis, first of all, land use land cover has been classified. barren, fellow land and low economic and less resourceful land has been considered for highly suitable to allocate new health care. secondly, the rail network and road accessibility have been considered for merging with the above-mentioned land category. areas properly connected through road and rail was taken as a suitable site for such allocation. parallelly, residential accessibility was also taken, km buffer of areas having high population get more preference for suitable sites. this decision approach supported to propose suitable sites for allocating new health care facilities in inaccessible areas. among this which site is best suitable and where argent requires for allocating new health care is determined using topsis analysis. topsis is the acronym of technique for order preference by similarity to ideal solution. this concept was first introduced by hwang and yoon in [ ] . topsis is one of the best multi-criteria decision-making methods used for selecting the best solution from decision criteria. there are adequate studies related to topsis application [ ] . using topsis, the ideal and non-ideal solutions are identified simultaneously. in the present study, topsis was used to identify the best site among the proposed suitable sites for health care service. this method is quite simple which is presenting a satisfactory performance in different field of applications. the idea of topsis procedure can be conveyed in a sequence of following steps [ ] . step prepare the decision matrix and determine the weight to decision criteria suppose, . . .w a n ) is the weight vector for a-expert or decision-maker, where w a þ w a þ w a þ Á Á Á þ w a n ¼ Þ and a = , , , … n. in a decision matrix, the linguistic term expresses low to excellent range, which has to convert using a point scale in topsis ( table ) . the selected criteria of the decision making can be: benefit functions (more is a high preference) or non-beneficial (less is a high preference) step calculate the normalised decision matrix ( x ij ). the normalised value x ij is expressed as: step calculate weighted normalised matrix by multiplying its associated weight. the weighted normalised value v ij is expressed as: step calculate the ideal best and ideal worst value where i and i is associated with the ascent and descent factor respectively. the ideal best and worst value is depending on the selected criteria. here, for selecting a suitable site for health care service, the lower distance from the transportation route and the residential area will get the highest preference value and vice versa. step calculate the euclidean distance from the ideal best now, the euclidean distance has to calculate from the ideal best value, s þ j . # : step calculate the euclidean distance from the ideal worst same as ideal best, the ideal worst value, s À j .is as follows # : step calculate performance score finally, the preference score or relative closeness to the ideal solution is determined. the pi is expressed by: the higher p i the value indicates the best site for allocating propose health care infrastructure. topsis is an efficient technique of multi-criteria decision analysis which emerges as a suitable technique for finding a suitable site and ideal solution. murshidabad district is one of the backward districts in west bengal, where about % of india's poor population reside. healthcare accessibility in this district is always a major issue as most of the inhabitants prefer to gain health services due to poor and inefficient access to medical care. the present study aims to explore the suitable site for medical institution after analysing the present situation of healthcare facilities within the district. for the existing situation of health care facilities, the relative accessibility index (rai) was calculated by considering no. of medical institutions, no.of patients admitted in a year, and no.of doctors available per day. the result of the existing situation of health care facility reveals that the spatial distribution is not uniform throughout the district (fig. ) . hence, for better treatment and getting well service, the resident has to travel another place after crossing a long distance, whereas, many suitable conditions for developing better service have already existed here. the numbers of medical institutions have clustered at the central part of the district i.e. the district capital, behrampore. as a result, the residents of the other areas are suffered to get better facilities. thus, the present study was carried out to find suitable locations for health care service by developing a hybrid decision model. for the same, the decision hierarchy was developed using existing facilities of health service, their spatial distance, density and proximity to support the suitable places; where better and well-accommodated health care service would be proposed by considering the local land use and distance to transportation route. after getting selected the proposed sites for local health care service, these are linked with district medical college and hospital located at the central part using shortest path network analysis. this effort will offer better health service to the residents reside at the peripheral part of the district and interlinked with medical college and hospital. the relative accessibility index (rai) of present health care service was calculated and the result reveals that only behrampore (the capital city of the district) block has good rai, whereas murshidabad-jiaganj and domkal have moderate rai and all others blocks have low to very low rai value. thus, from the overall spatial result of the relative accessibility index, it can be decided that the study area is poor in term of medical facilities and health care services. thus, looking towards this problematic issue, the present study proposed a hybrid decision model with three tiers of analysis to find suitable places for new health care and shortest distance to interlink with the district medical hospital. the first tier of analysis highlighted the distance, density and proximate of each existing health care to find more deprived and inaccessible areas in term of health service. the second tier of analysis emphasized to find suitable sites for proposing new health care services based on land use land cover, distance to road, rail, and proximity to residential areas. finally, the third tier of analysis highlighted the most suitable sites among the proposed health care and network analysis through the shortest path to connect with district medical hospital. figure a illustrates the prevailing conditions of health care services including medical college, hospital and nursing home. there are only one medical college and hospital is available. there are some other hospitals and nursing home also available both governmental and public but the spatial equality or homogeneity is not found. figure b -d show the distance, density and proximate to present healthcare facilities respectively. the result from each layer of distance, density and proximate reveal that the peripheral parts of the district have always lower facilities. the lower distance to healthcare indicates higher accessibility; lower density of healthcare indicates lower accessibility and closer to healthcare shows higher accessibility. these three spatial layers integrated to consider the accessibility and inaccessibility to healthcare facilities throughout the study area. the - point scale was used for linear combination i.e. for low accessibility (inaccessibility) and for higher accessibility of healthcare facility. figure shows the overlay result. it depicts the higher and lower spatial accessibility. this result supports the second tier of analysis i.e. to find suitable sites for proposing new healthcare facilities in inaccessible areas. the suitable sites were considered based on four decision factors. these factors were buffered around the road, buffer around rail, land use land cover and buffer around the settlement (fig. ) . if the suitable sites for healthcare facilities would locate within km, then it would be considered as more suitable than located far from the road because it will offer higher accessibility in term of travel time as well as travel coast. any healthcare that is located nearer to the railway, it will be considered as most accessible because of getting immediate and fast service. land use land cover should be considered before going to choose any space for locating health care. barren land and agricultural fellow nearer to the settlement would be the best site because this site will offer lower land price, utilization of land, and also getting close to residential areas. the sites that are located within km distance to populated areas the maps are produced with the help of data illustrated in table . will get the best service. thus, looking towards these conditions, suitable sites were proposed in the present study which can offer health service in inaccessible areas (fig. ) . the proposed health care in a suitable site can be merged and interlinked with existing health care to make the district enrich and accessible in term of health service. this is figured in supplementary file (s- ). it is essential to validate the location of each proposed site by verifying ground truth. in the present study, site- and site- were validated through the ground visit. but the rest sites were not visited; instead, their ground truth was measured in google earth search engine. all sites were selected in open space, barren land or agricultural land, by considering their location nearer to settlement, closer to transportable routes. these ground locations were also plotted in google earth search engine, which is shown in fig. . the technique for order of preference by similarity to ideal solution (topsis) was used to find the ideal best and ideal worst among the proposed sites. the ideal best and the ideal worst value was determined based on the above four decision factors. thus, for considering the most suitable site of health care service keeping transportation service, and nearer to residential areas; the lowest distance has given the highest preference. concomitantly, for land use, the unsuitable land use category has given the lowest preference (table ). the result shows that site- has the highest preference score (pi) with . . this site is far from the district medical college and other health care facilities and resides by a huge number of the rural population and therefore, it is the best site to shape modern health care to serve large people surrounding by connecting district medical college and another hospital on a transfer basis (figs. , ) . consequently, site- and have the lowest pi with . and . respectively. it is because these sites are within km from the district medical college, very close to rail and road transportation route and these sites are considered as already accessible. the other selected suitable sites have high to moderate suitability score depends on their location and health service requirements. finally, the shortest path network analysis was performed to measure the shortest open street distance and get access to reach district medical college and hospital in case of emergency. this task also supports the requirement to construct a new health care service in a suitable site. hence, the far distance from district medical college has the highest require score in comparison to located nearer distance. the shortest path network analysis reveals that site- and are located far distance with . and . km respectively which indicate more requirements to manage and construct new health care to offer better service to deprived groups of people. in comparison to site- and is much closer with . and . km correspondingly which indicate not a big issue would arise in case of not building new health care facilities (table ; fig. ). health is an important aspect of human existence as well as social well beings. good service in health can improve the quality of life and progress of a society. but recently, the rapid growth of population and enormous pressure on land, make the service worse. thus, the present study aimed to emphasize healthcare accessibility issues based on service and accommodation available in the study area. the study area, murshidabad district is a backward district of west bengal, india in term of basics infrastructural facilities and amenities [ ] . hence, the present study on healthcare accessibility and site suitability are required to highlight previous attempts evidenced that several studies defined accessibility in term of availability, accommodation, affordability, and sometimes acceptability [ ] . for example, aday and andersen [ ] defined healthcare accessibility based on the availability of health services with special reference to financial, informational, and behavioural influences. unlike, gulliford et al. [ ] focused on health demand by highlighting differentiation between having access to healthcare and gaining access to healthcare for overcoming the financial and organizational barriers from health service. all in all, other studies also evidenced common factors for determining healthcare accessibility like ( ) spatial distribution of healthcare facilities; ( ) transportation facilities and distance to healthcare centres; ( ) socio-economic conditions of nearing population group; and ( ) accommodation available in healthcare centres [ ] . based on these above-mentioned factors; recently, major concerns have been giving on geographic information system (gis) applications in the field of healthcare accessibility due to its efficiency and accuracy in spatial as well as non-spatial analysis [ , ] . site suitability is another aspect of the healthcare facility, in which major concerns have given towards suitable locations for constructing new healthcare to provide better service to target groups as well as overcome unequal distribution problems of healthcare service. site suitability is based on multi-factor analysis because searching and locating healthcare in suitable places are depended on more than one factor [ ] . many studies carried out in the field of site suitability of healthcare using gis and multi-criteria techniques. such as the optimum site selection for a hospital in tehran using a geographical information system [ ] ; the suitable site of a regional hospital in taiwan using analytic hierarchy process, sensitivity analysis, and delphi method [ ] . but previously it was not attempted to analysis healthcare site suitability by considering accessibility first. poor accessibility needs proposing new healthcare. so, it is essential to analysis accessibility first for proposing and constructing of new healthcare, which is covered in this study. unlike many studies only highlighted healthcare accessibility [ ] [ ] [ ] or site suitability [ , ] , the present study considered these inter-connected factors by developing a hybrid model. from that point of view, this study is unique and no similar studies ever carried out before. the present study would be useful to health planner in the study area as well as other regions also with similar geographical settings for defining inaccessible areas and locating suitable sites for better health service. while many previous researchers have highlighted the nonspatial data and statistical inference to analyse healthcare accessibility in a geographical location, the present study applied a gis-based hybrid decision-making approach for assessing the spatial accessibility of healthcare facilities and site suitability analysis in murshidabad district of west bengal, india. this study offered a hybrid decision-making approach for proposing suitable sites regarding better healthcare service. to provide better service and reach a deprived group of people, a pipe dream was prepared that needs for logical decision making and resolve existing problems. the result reveals that spatial discrepancy exists in case of access to healthcare facilities. the location of existing healthcare primarily clusters in the central and northern portions. the distance, density and proximate analysis of these healthcares explored the spatial inaccessible areas. to achieve equal access, allocation of new healthcare is essential to reduce the spatial disparity. thus, the equal weight-based weighted linear combination was performed using land use, proximity to road and rail, and distance to residential areas to support suitable site for determining and allocating new healthcare facilities. decision support system with gis integration offered suitable sites in inaccessible areas. poor accessibility and poor health services in an area increase health and social disparities. hence, the urgent requirement is needed to increase accessibility. the present study emphasized the same issue and developed a hybrid approach to integrate b fig. the ground truth areas of proposed healthcare sites after suitability analysis a site- : considered as best site and urgent require for new healthcare, this site is . km from rajgram rail station, surrounded by dense rural population and no healthcare facilities nearby, b site- : very close to nh and located in a sparse populated areas, c site- : far away from nh , nh and railway junction, d site- : . km from bhagwangola railway station and located in a densely populated areas, e site- : very close to ganga river and sagar para road, also located near many densely settlement areas, f site- : locate at south-west corner of the district and very close to sh , g site- : this site is very close to district medical college and hospital, h site- : just beside sh , goghata bus stoppage and about . km from ramel health destination multiple factors to propose sites for new healthcare facilities and remove spatial disparity in the study area. the advantages of developing and using a hybrid model are ( ) it integrates different models and approach, ( ) it increases the accuracy of the result and reduces the drawbacks of single model and method, ( ) it interlinks two or more aspects of a study, for instance, in case of the present study using a hybrid model two aspects, i.e. accessibility to healthcare and site suitability of healthcare have been determined, and ( ) hybrid model helps in simplifying the complex relationship of among criteria or perspective. however, a hybrid model also suffers from some disadvantages, like ( ) it is more complex to apply ( ) the preparation of model take more times than single model ( ) sometimes it also suffers from overfitting etc. therefore, removing such disadvantages and adopting the abovementioned advantages, the present proposed and used a hybrid model to show healthcare accessibility and site suitability. this study would appear like a good source of health service enhancement and plan implementation to policymakers and health planners. recently, not only our country but the world worried about the outbreak of 'novel coronavirus'. the government truly finds enormous space for 'isolation' service. globally , , confirmed cases are reported (till . . ). india is no exception in this case, where, , confirmed cases are identified. it is assumed that murshidabad is also too much vulnerable as there is a lack of basic infrastructure. therefore, in this regards, it is recommended that government and local planners can start medical emergency in proposed sites with primary equipment and service to reach target population groups. on getting success in the mission, modern infrastructural development and service could be proposed afterwards. accessibility analysis of health care facility using geospatial techniques un chief urges greater efforts to improve health and well-being of indigenous peoples human rights and health. world health organisation president's commission for the study of ethical problems in medicine and biomedicine and behavioral science research president's commission delivering quality health services: a global imperative for universal health coverage. geneva: world health organization. licence: cc by-nc-sa . igo. retrieved on convergence and determinants of health expenditures in oecd countries the determinants of health expenditure: a country-level panel data analysis. geneva: world health organization inequities in access to health services in india: caste, class and region the challenge of building rural health services deprivation, healthcare accessibility and satisfaction: geographical context and scale implications a literature review of the use of gis-based measures of access to health care services measuring geographic access to health care: raster and network-based methods measuring spatial accessibility to primary health care services: utilising dynamic catchment sizes measuring time accessibility and its spatial characteristics in the urban areas of beijing the concept of access: definition and relationship to consumer satisfaction an integrated approach to measuring potential spatial access to health care services accessibility evaluation of land-use and transport strategies: review and research directions measures of spatial accessibility to health care in a gis environment: synthesis and a case study in the chicago region. environment and planning b: planning and design evaluating the accessibility of healthcare facilities using an integrated catchment area approach is there a role for gis in the 'new nhs'? gis and public health accessibility to general practitioners in rural south australia: a case study using geographic information system technology the impact of geographic accessibility on the intensity and quality of depression treatment accessibility and health service utilization for asthma in retrieved from murshidabad district density estimation for statistics and data analysis weighted linear combination method versus grid based overlay operation method-a study for potential soil erosion susceptibility analysis of malda district (west bengal) in india. the egyptian journal of remote sensing and space science integrating geographical information systems and multiple criteria decision making methods using analytic hierarchy process with gis for dengue risk mapping in kolkata municipal corporation spatial susceptibility analysis of vector-borne diseases in kmc using geospatial technique and mcdm approach. modeling earth systems and environment mapping of mosquito-borne diseases in kolkata municipal corporation using gis and ahp based decision making approach application of multi-attribute decision-making methods in swot analysis of mine waste management (case study: sirjan's golgohar iron mine, iran) using hca and topsis approaches in personal digital assistant menu-icon interface design an algorithmic method to extend topsis for decision making problems with interval data appraisal of infrastructural amenities to analyze spatial backwardness of murshidabad district using wsm and gis-based kernel estimation using gis for determining variations in health access in jeddah city, saudi arabia framework for the study of access to medical care what does 'access to health care' mean? literature review of the use of gis-based measures of access to health care services development of a web based gis for health facilities mapping, monitoring and reporting: a case study of the zambian ministry of health hospital site selection using two-stage fuzzy multi-criteria decision making process hospital site selection using fuzzy ahp and its derivatives optimal selection of location for taiwanese hospitals to ensure a competitive advantage by using the analytic hierarchy process and sensitivity analysis publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations acknowledgements we thankfully acknowledge the anonymous three reviewers and the editor in chief for their valuable time, productive comments and suggestions for enlightening the overall quality of our manuscript.funding no fund was received from any sources. conflict of interest on behalf of all authors, the corresponding author states that there is no conflict of interest. key: cord- -u amf oh authors: parsons, lisa m.; bouwman, kim m.; azurmendi, hugo; de vries, robert p.; cipollo, john f.; verheije, monique h. title: glycosylation of the viral attachment protein of avian coronavirus is essential for host cell and receptor binding date: - - journal: journal of biological chemistry doi: . /jbc.ra . sha: doc_id: cord_uid: u amf oh avian coronaviruses, including infectious bronchitis virus (ibv), are important respiratory pathogens of poultry. the heavily glycosylated ibv spike protein is responsible for binding to host tissues. glycosylation sites in the spike protein are highly conserved across viral genotypes, suggesting an important role for this modification in the virus life cycle. here, we analyzed the n-glycosylation of the receptor-binding domain (rbd) of ibv strain m spike protein and assessed the role of this modification in host receptor binding. ten single asn–to–ala substitutions at the predicted n-glycosylation sites of the m –rbd were evaluated along with two control val–to–ala substitutions. cd analysis revealed that the secondary structure of all variants was retained compared with the unmodified m –rbd construct. six of the glycosylation variants lost binding to chicken trachea tissue and an elisa-presented α , -linked sialic acid oligosaccharide ligand. lc/ms(e) glycomics analysis revealed that glycosylation sites have specific proportions of n-glycan subtypes. overall, the glycosylation patterns of most variant rbds were highly similar to those of the unmodified m –rbd construct. in silico docking experiments with the recently published cryo-em structure of the m ibv spike protein and our glycosylation results revealed a potential ligand receptor site that is ringed by four glycosylation sites that dramatically impact ligand binding. combined with the results of previous array studies, the glycosylation and mutational analyses presented here suggest a unique glycosylation-dependent binding modality for the m spike protein. avian coronaviruses of poultry cause significant disease with subsequent economic losses in several commercially farmed bird species. avian infectious bronchitis virus (ibv) is a gam-macoronavirus that predominantly affects domestic fowl, primarily chickens (gallus gallus). the virus initially infects upper airway epithelium tissues, and depending on the ibv strain, disease outcomes range from mild respiratory disease to kidney failure and death ( ) . the viral envelope of ibv contains the highly-glycosylated spike (s) protein that is post-translationally cleaved into two domains, s and s . this s glycoprotein is the major adhesion molecule of the virus. it is a class i viral fusion protein, in which the variable s domain is involved in host cell receptor binding, and the more conserved s domain mediates the fusion of the virion with the cellular membrane ( , ) . the role of spike in host cell attachment and the induction of protective immunity has been reviewed ( ) . the spike protein monomer is a transmembrane glycoprotein with a molecular mass of kda before glycosylation ( ) . a cleavable n-terminal signal peptide ( ) directs the s protein toward the endoplasmic reticulum (er), where it is extensively modified with n-linked glycosylation ( , ) . after glycosylation in the er, the monomers oligomerize to form trimers ( - ) . the n-terminal amino acids of s were shown to encompass the receptor-binding domain (rbd) of ibv strain m ( ) , which interacts with sialyl-␣ , -substituted glycans present on the host's cell surface ( , ) . ten n-linked glycosylation sites are predicted to exist on the m -rbd ( ), of which most are highly conserved (fig. s ). it is interesting that of the sites are - % conserved. sites asn- and asn- were less conserved at and %. however, each had a nearby alternative site that was also highly conserved. alternative site asn- was conserved % of the time, and one or both asn- and asn- was present in % of the sequences. site asn- was conserved at %. in % of the sequences, either asn- or asn- was present but never together. therefore, all sites, including the alternatives, likely serve important functions. the n-glycosylation of viral glycoproteins is known to modulate the ability of viruses to infect host cells and to be recognized by the host's immune system ( ) . recently, zheng et al. ( ) studied extracted spike proteins and mutant viruses with asn-to-asp (asparagine to aspartate) and asn-to-gln (aspara-gine to glutamine) mutations at predicted glycosylation sites in the s protein of the beaudette ibv strain ( ) . their results indicate that glycosylation at some sites on the beaudette s -rbd was important for viral fusion and infectivity, which may include host recognition. however, the beaudette strain is a cell culture-adapted strain, is nonvirulent in chickens ( ) , and does not bind chicken tissues known to be important for infectivity ( ) , making it difficult to extrapolate these results to clinically relevant ibvs. to characterize and assess the role that glycosylation plays when interacting with host tissues through the rbd of pathogenic ibv strain m , we used a combination of molecular and analytical techniques, including histochemistry, elisa, circular dichroism (cd), ms, and docking analyses as listed in table . systematic deletion of each glycosylation site and histochemical analysis of each variant revealed which of the glycosylation sites affect the binding of ibv s protein to host epithelial tissue. site occupancy analysis by lc/ms e indicated that at least of predicted n-glycosylation sites in the m -rbd domain are glycosylated. analysis of site occupancy and signature n-glycan patterns at each site in combination with single glycosylation site deletions provided insight toward the biological relevance of each of those sites in binding to host tissue receptors. overall, our data confirm that n-glycosylation plays a critical and likely unique role in binding of the ibv spike domain to its host tissue receptors. to analyze the role of glycosylation of m -rbd in receptor binding, missense mutants (asn-to-ala) were generated on a site-by-site basis at each of the predicted n-glycosylation sites. recombinantly produced glycovariant rbd proteins migrated with the same electrophoretic mobility as unmodified m -rbd (fig. ) . the rbd proteins were evaluated by cd spectroscopy to assess similarity to the wt secondary structure. wt m -rbd, all glycosylation-site variants, and two nonglycosylation variants, v a and v a, were analyzed for secondary structure differences at °c. thermal melts were performed on each construct from to °c followed by full scans collected at °c and again at °c after the melt. overlays of all the cd spectra can be found in fig. s . visually, all spectra at all temperatures follow the same curve. the n a spectra were generated at higher protein concentrations but aligned well to cd spectra of all other variants when normal-ized to the percent of maximum signal. likewise, all the proteins had analogous broad melting curves suggesting the proteins were similarly stable. protein folding was reversible for all proteins, with comparable recovery rates (see cd- °c-aftermelt-normalized in fig. s ). dichroweb ( ) was used to calculate the percent of ␣-helix, ␤-strand, turn, and unordered portions of the protein in the initial °c spectra to estimate secondary structure differences between the proteins (fig. ) . the percent of ␣-helix varied with the extremes being unmodified rbd and n a. n a exhibited . Ϯ . % ␣-helix character as compared with wt, which has . Ϯ . %. interestingly, n a gave a very strong signal in the histochemical assay ( fig. a ) and had the most notably different released glycans' signature compared with the other constructs. we conclude that all proteins maintained a very similar structure and therefore suggest that single n-glycosylation sites are by themselves not indispensable for protein folding or stability. because we established that all variant m -rbd proteins are folded, we investigated their abilities to bind tissue receptors. recombinant proteins were incubated with chicken trachea tissue sections and examined by histochemical analysis. n a, n a, n a, n a, v a, and v a bound ciliated epithelial cells of the chicken trachea with similar staining intensity as the unmodified rbd with the most intense staining associated with the n a construct (fig. a ). in contrast, binding of constructs n a, n a, n a, n a, n a, and n a to trachea tissue was not detectable. removal of sialic acids by treatment of the trachea tissues with arthrobacter ureafaciens neuraminidase (auna) abrogated binding of all constructs as shown in fig. s . these results demonstrate that glycosylation on the rbd affects binding to sialyl ligands on chicken trachea tissue. the interaction of the variants with neu ac(␣ - )gal (␤ - )glcnac, a previously established ligand for m ( ) , was assayed by elisa. n a, n a, n a, and n a variants were able to bind the ligand in a concentration-dependent manner (fig. b ) like unmodified rbd. binding affinities of n a, n a, n a, n a, n a, and n a were significantly reduced compared with unmodified rbd and comparable with that of a negative control protein, the s of turkey coronavirus, with specificity for nonsialylated dilacnac glycans ( ) . fig. c shows the elisa absorbance at the nmol of ligand concentration for each construct. no significant difference was observed for variants n a, n a, n a, and n a compared with unmodified rbd (shown in dark gray bars in fig. c ). all other variants (shown in light gray bars in fig. c ) demonstrated significantly lower affinity for the receptor, consistent with histochemistry and ligand titration plot results. six of the single glycosylation site variants lost the ability to bind ligand. to investigate whether global changes in glycosylation may have affected binding, we analyzed release glycans from each protein. matrix-assisted laser desorption/ionizationtime of flight (maldi-tof) mass spectrometry (ms) analysis of enzymatically released and permethylated glycans allows for semi-quantitative analysis of glycan compositions. the method is particularly useful for samples containing sialylated glycans because they are stabilized by permethylation. the percent abundances of glycans identified in each sample are shown in fig. . the majority of the asn-to-ala variants, as well as the v a and v a control variants, had similar maldi-tof-ms permethylation profiles (fig. ) . over glycan compositions were identified ranging from high-mannose glycans to large complex ones. nearly half of the glycans contained at least one and up to three sialic acid molecules in all samples. the most intense glycoforms clustered in five groups with increasing amounts of complexity as reflected by the number of n-acetyl glucosamines (hexnacs). these include high-mannose, complex, and hybrid forms as follows: i, hex - hexnac (high mannose); ii, neuac - hex - dhex - hexnac (complex and hybrid); iii, neuac - hex dhex hexnac (complex); iv, neuac - hex dhex hexnac (complex); and v, neuac hex dhex hexnac (complex). high-mannose glycans were less abundant in unmodified m than in variant rbds. the n a, n a, and n a variants contained diminished amounts of the group v high-mass complex glycans. the n a variant was the most atypical with less defined clustering in the common clustering regions of the spectrum and higher abundances in spectral regions where compositions had less hex and more hexnac overall. for instance, cluster iv was shifted from glycans with hexoses (neuac - hex dhex hexnac ) to glycoforms with - hexoses (neuac - hex - dhex hexnac ). more abundance was observed in regions containing hexnac residues (neuac - hex - dhex hexnac ). to better understand the difference between n a and the other constructs, we calculated the monosaccharide percent mass and average mass for each construct. the average mass percent for glycans across all released glycan pools was hex ( . %), hexnac ( . %), dhex ( . %), and neuac ( . %). the n a construct had the lowest amount of hex ( . %) and the highest amounts of hexnac ( . %) and neuac ( . %). the former two were s.d. or greater from the mean (see table s ). this indicates that the n a construct likely had shorter, more branched, and more highly-charged glycans on average than the other constructs. two other variants had values more than s.d. from the mean. n a (normal binding) was most abundant in hex ( . %) and least abundant in hexnac ( . %) and dhex ( . %), probably due to its higher high-mannose content. n a (normal binding) had the lowest amount of neuac ( . %). this is perhaps a reflection of the missing sugars in this variant because site asn- in other variants was populated with many sialylated glycoforms based on site-specific analysis (table s ). to assess the differences in glycosylation on a site-to-site basis, glycopeptide lc/ms analysis was carried out on unmodified m and two single glycosylation site variants, n a and n a, that represented a nonbinder and a binder of trachea tissue, respectively. m -rbd had predicted glycosylation sites, whereas the variant rbds had nine each. n a was also of specific interest due to the unique glycosylation pattern observed in its free glycan profile. as cleavage with trypsin alone resulted in glycopeptides with more than one glycosylation site, we also analyzed glycopeptides after an additional treatment with chymotrypsin, which resulted in one glycosite per peptide, the identification of more glycopeptides, and decreased ambiguity concerning glycosylation site assignment. although a protein may contain the sequence (nx(s/t)), where n-glycosylation is known to occur, it may not actually be glycosylated, or it may be glycosylated only part of the time. potential glycosylation sites, their predicted glycosylation state, and their measured site occupancy are shown in table . of the glycosites, all but asn- were predicted to be glycosylated (occupied) based on netnglyc analysis (http://www.cbs. dtu.dk/services/netnglyc- . /). percent occupancy was analyzed by lc/ms; however, a poor signal was obtained for the asn- site in m and n a, and therefore, occupancies were not calculated. all other sites were estimated to be occu- figure . tissue-binding assay and elisas. histochemical assays of recombinant unmodified m -rbd and single asn-to-ala and val-to-ala glycosylation variants to trachea tissue (a) and elisa-presented neu ac␣ - gal␤ - glcnac (b and c). b, concentration dependence of binding. c, absorbance for each protein at the -nmol concentration. two-way anova showed significantly less binding by variant n a, n a, n a, n a, n a, and n a rbd proteins compared with unmodified rbd (compare light gray bars (variant) to unmodified (black bar)). no significant (n.s.) difference was observed for variants with dark gray bars. data points are averaged from three separate assays. ****, p Ͻ . . pied at % or greater in m and n a. the n a variant exhibited site occupancy at all expected sites, including asn- , although signal intensity at that site was low. two sites had much lower occupancy in n a as compared with the other samples. site asn- dropped to % occupancy and site asn- to % occupancy compared with nearly complete occupancy in the n a and m proteins. overall site occupancy was high for all sites. the difficulty in detecting some of the peptides, particularly asn- , may be due to hydrophobicity. ionization is partially driven by hydrophobicity, and asn- only had % hydrophobic character after the two digestions, which may, in part, explain its low detectability. by comparison, glycopeptides containing asn- , asn- , and asn- were short and between and % hydrophobicity, whereas glycopeptides containing other sites had predicted hydrophobicity ranging from to % and tended to produce higher intensity spectra. glycoform relative abundances at each site are listed in table s . fig. shows the location of each glycosylation site on the rbd of m . overall compositions at each site were similar in charge and size across the three constructs. a representative glycan is shown at each site based on peak intensity. the n a construct had glycoforms like those identified by maldi-tof ms with more hexnac and fewer hex compared with m and n a. fewer overall glycan compositions were detected on glycopeptides by lc/ms compared with the free glycans observed by maldi-tof ms ( versus compositions). this can be expected because the technology of instrumentation used and the physiochemical characteristics of permethylated glycans and glycopeptides differ significantly. the forms detected overlapped between the two analyses. during our investigation, the first structure of the m spike protein was solved using electron microscopy (em) ( ) . mapping the glycosylation sites onto the structure did not lead to a clear understanding of how the mutations affect binding. although em structural resolution is limited, and the precise coordinates for the attached glycans are not known, an attempt was made to dock a series of potentially sialylated ligands to a glycan-stripped structure of the rbd and a structure that was populated with glycans based on our data. the glycan chosen for each site on the rbd was based on the predominant glycans identified at each site by lc/ms (see fig. ). seventeen oligosaccharide ligands were chosen based on a previous glycan array study of m ( ) and elisa data (this work). both strong and weak binders were selected (fig. ) . each ligand was docked times against both the sugarstripped and in silico glycosylated m -rbd coordinates. there was no statistically significant difference between the docked binding energies of ligands that did and did not bind on the array. all oligosaccharide ligands, except for , , , , , and , docked seven or more times to one or more of the four sites on the m sugar-stripped structure with no clear pattern differentiating between them (fig. ). in the sugar-stripped structure, all binding occurred at sites a and b. site a is under the galectin fold near site asn- , and site b encompasses asn- and asn- . all three glycosylation sites are required for binding to trachea tissue. the docking pattern changed dramatically when glycans were modeled onto the structure. the most dramatic change was seen at site d where eight ligands bound seven or more times, whereas interactions at all other sites decreased. there were no binders at site a, only two at site c ( and ) and three at site b ( , , and ) . all of the ligand oligosaccharides that docked at site d were sialylated, consistent with ligands identified by array and elisa. no control ligand ( and uncharged; and kdn-charged) bound at site d. the interaction at site d involved both sugar-protein and sugar-sugar contacts, and in some docking runs, the interaction was completely sugar-sugar. site d is in the center of a circle of glycosylation sites that showed altered binding profiles when mutated; n a, n a, and n a lost the ability to bind, whereas n a gave a very strong signal in the histochemical assay. of note, no ligands docked in the site at the top of the galectin fold where many structural homologs of m are thought to bind sugars, such as the bovine coronavirus rbd ( ) . for comparison, we docked neu ac(␣ - )gal(␤ - )glcnac (␤-ome) against the crystal structure of the bovine rbd. twenty five of times the glycan docked in the proposed binding site at the top of the galectin fold in the negatively-charged area of the bovine rbd control near asn- (fig. b ). previously, we established that the ibv m s protein binds sialic acid-substituted glycoconjugate ligands in chicken trachea and lung tissue ( ) . intriguingly, the m rbd is highlyglycosylated with potential glycosylation sites, and glycosylation appears to be necessary for binding to host tissues because treating the protein with a neuraminidase diminishes binding ( ) . this study extends our investigation toward determining the role of glycosylation in the function of the rbd, which encompasses the n-terminal region of the native protein. each of the potential glycosylation sites was individually ablated, and each construct was examined for its ability to bind tissue and an elisa-presented ligand. in addition, the global glycosylation profile of every construct was surveyed, and glycosylation of three representative constructs was examined on a site-specific basis. six of the glycosylation sites in the rbd domain of ibv m were essential for binding to chicken trachea tissue and an elisa-presented sialylated oligosaccharide ligand. cd analysis demonstrated that both secondary structure and stability were similar across all the rbd constructs indicating the proper fold was likely retained for all. globally, percent abundances of sialylated glycans differed across mutants, but the differences were not associated with loss of binding. for example, and % of the glycans in binding mutants n a and n a, respectively, and and % of the glycans in the nonbinders n a and n a, respectively, were sialylated (summed from fig. ) . by comparison, % of the glycans in the unmodified rbd construct were sialylated. on a site-specific basis, some glycosylation sites had more sialylation than others (table s ). on average, each of glycosites asn- , asn- , asn- , and asn- were sialylated at least % of the time. sites asn- and asn- were in the less-ordered region of the protein figure . site-specific glycosylation of m , n a, and n a. the s -n-terminal receptor binding domain residues - from pdb entry cv is represented as gray ribbons. the asparagines of glycosylation sites that could still bind trachea tissue after mutation to alanine are in cyan, and those that could not are in dark red. glcnac residues from the structure are dark blue balls and sticks. the most predominant glycan for each site across all three constructs is shown to the right. glycoforms shown on the right are based on our data, and inferred structural detail is based on accepted knowledge of the cell type used in protein production. monosaccharides are represented as follows: mannose (green circles); galactose (yellow circles); glcnac (blue squares); fucose (red triangles); and sialic acid (purple diamonds). numbering of the sites is based on the mature sequence. the figure was made with ccp mg ( ) and gimp. ( ) and referenced in the figure as array score . white columns were against structure without sugars, and gray columns were lc/ms-identified where the sugars were modeled. bottom, rbd-binding domain of m from pdb structure cv . glycosylation sites are shown as cyan balls. sites where two or more oligosaccharides docked seven or more times are indicated as colored space-filled amino acids. colors and labels match the table above. b is a turned °toward the user. structure representations were made in ccp -mg ( ) . sugar symbols were rendered with drawglycan-snfg (www.virtualglycome.org/drawglycan/) ( ) . away from the galectin fold where binding is associated in the docking study. site asn- is at the bottom of the galectin fold and is required for ligand binding. site asn- is at the top of the galectin fold and is also required for binding. although we cannot conclude that sialylation is required at asn- and asn- , it is clear that glycosylation at these sites serves a role in ligand binding. the publication of the cryo-em structure of m ( ), the first structure of a spike protein from a gammacoronavirus, made it possible to visualize the distribution of the glycosylation sites in the tertiary structure of the protein. the study verified the site occupancy we observed on m -rbd because of of the glycosylation sites in the em structure were occupied. site asn- , not occupied in the em structure, is on a ␤-strand in the em structure, and it forms close contacts with the s c-terminal domain in the native protein. the c-terminal domain was not part of our construct. therefore, asn- in the recombinant constructs was likely in an environment much different from that found in the full-length protein. many human galectins, and also the bovine ␤-coronavirus spike protein ( ) , bind sugars at what is the top of the ␤-sandwich near site asn- in the rbd constructs (see fig. ). the bovine rbd site asn- closely aligns with site asn- of m (see fig. ). in the bovine protein, this demarks the region of proposed ligand binding. loss of asn- in the m rbd abrogates binding to trachea tissue. although ablation of asn- diminishes ligand binding, our docking study gave no evidence that this is the sialyl ligand-binding site in m . evaluation of the charge distribution in the proposed binding sites indicates that the bovine site is negatively charged, whereas the negative charge in the same region in m is sparse (fig. ) . this difference in charge near asn- may explain the lack of ligand docking in this region (gray ␤-strands in fig. b ) during docking simulations. the precise ligand-binding region of proteins with a galectin fold varies. rotavirus protein vp , for example, binds sialic acid in a groove between the ␤-sheets of the sandwich ( ) . the clustering of five of six required n-glycosylation sites suggests the location of the ligand-binding site may be on the right of the galectin fold as shown in fig. . our docking experiments studying possible oligosaccharide ligands to m were not conclusive in terms of binding energies but did identify four potential saccharide-binding regions (fig. ) . docking also demonstrated that glycosylation affects binding in silico because one potential site (site a; see fig. ) lost favor, whereas another one, site d, dramatically gained favor when the protein was glycosylated. site d is in the center of three glycosylated asparagines required for binding (asn- , asn- , and asn- ), and one whose loss results in a very strong histochemical signal and has a protein-wide effect on glycosylation with increased sialylation (asn- ) . in addition, the site d region is negatively charged (see fig. a ) like the proposed sialyl ligandbinding site on the bovine protein (fig. b) ( ) . all the ligands that interacted with site d were sialylated and included the glycan that bound in our elisa studies. interestingly, carbohydrate-carbohydrate contacts were detected in the rbd-ligand interactions at site d. this is an intriguing result because carbohydrate-carbohydrate interactions, although not common, have been reported between nonfucosylated antibodies and their receptor, in cell-cell adhesion interactions, between tumor antigens, and between bacterial receptors and mucin ( ) ( ) ( ) ( ) ( ) . a literature search did not uncover any reported carbohydrate-carbohydrate interactions between virus and host. although our docking study must be evaluated in the context of the higher root mean square deviations typical of em structures, and the inexactness of modeled oligosaccharides, results suggest that a combination of carbohydratecarbohydrate and carbohydrate-protein interactions should be considered in the binding mechanism. in conclusion, we have shown that glycosylation of six sites on the m ibv rbd are necessary for the interaction of m with both trachea tissue and neu ac(␣ - )gal(␤ - ) glcnac ligand in elisa. based on occupancy data, at least nine sites were glycosylated in the recombinant m -rbd. deletion of individual glycosylation sites had little effect on secondary structure, but it did have some effect on overall glycosylation profiles of some variants, especially n a. some differences can be expected because one site, with specific glycans, is lost from each variant, thus mildly altering overall profiles. in silico docking suggests that glycosylation may guide ligand binding. especially intriguing is site d, where glycosylation is required for in silico docking at that site. the interaction of m ibv with sialyl ligand may prove to be a unique interaction involving both carbohydrates and protein. further investigation is warranted. the tissues used for this study were obtained from the tissue archive of the veterinary pathologic diagnostic center (department of pathobiology, faculty of veterinary medicine, utrecht university, the netherlands). this archive is composed of paraffin blocks with tissues maintained for diagnostic purposes; no permission from the committee on the ethics of animal experiment is required. the pcd vector containing ibv m -rbd in-frame with a c-terminal gcn trimerization motif and strep-tag has been a and pink boxes on b. y , e , w , and h in b are involved in binding to sialic acid. the large asterisk in a indicates possible binding site based on structural comparison between the two proteins. images were made with ccp -mg ( ) . bovine coordinates are from pdb code h . described previously ( ) . site-directed mutagenesis using the q technology (new england biolabs) was performed to mutate the asparagine-encoding residues of the n-linked glycosylation sequence motif nx(s/t) into alanine or valine using the primers in table . sequences of the resulting rbds were confirmed by sanger sequencing (macrogen, the netherlands). hek t (atcc crl- ) cells were transfected with pcd plasmids using polyethyleneimine at a : ratio. the recombinant proteins were purified using strep-tactin-sepharose beads, as described previously ( ) , and their production was confirmed by western blotting using strep-tactin hrp antibody (iba, germany). recombinant m and its variants were prepared for cd spectroscopy by buffer exchange and concentration with four centrifugation cycles through -kda mwco amicon ultra . -ml centrifugal filters (ufc ) into mm sodium phosphate, ph . . final concentrations were measured with a thermo fisher scientific nanodrop spectrophotometer. cd spectra were collected on a jasco j- spectropolarimeter with a peltier thermostated fluorescence temperature controller module. samples were diluted to . mg/ml and four scans accumulated from to nm with a scanning speed of nm/min, digital integrated time -s, bandwidth nm, and standard sensitivity at °c. a thermal melt was done from to °c with a ramp rate of °c/min. measurements were taken every °at , , , , , , , and nm. a full cd scan was collected at °c. the temperature was then lowered to °c. after allowing the protein to refold for min at °c, a third cd scan was taken at °c to measure recov-ery. a savitzky-golay filter was used to smooth cd data at different temperatures for visual comparison (fig. s ) . secondary structure calculations for the cd data collected at °c before the thermal melt were processed by dichroweb ( ) using the cdsstr ( ), selcon ( ) , and contill ( ) algorithms with protein reference set . results from the three algorithms were averaged and plotted in fig. . histochemistry was performed as described previously ( ) . briefly, chicken trachea tissues from a -week-old broiler chicken were sectioned at m before incubation with rbd proteins at g/ml. desialylated tissues were prepared by pre-treatment with milliunits of neuraminidase (sialidase) from a. ureafaciens (auna, sigma, germany) in mm potassium acetate, . mg/ml triton x- , ph . , at °c overnight before protein application. chicken trachea tissues were from a -week-old broiler chicken (g. gallus) obtained from the tissue archive of the veterinary pathologic diagnostic center (department of pathobiology, faculty of veterinary medicine, utrecht university, the netherlands). sialic acids (neu ac␣ - gal␤ - glcnac-paa, -sialc-paa, glyconz, russia) were coated ( g/well) in a -well maxisorp plate (nunc, sigma) at °c overnight, followed by blocking with % bsa (sigma) in pbs- , % tween. rbd proteins ( g/ml) were preincubated with strep-tactin-hrpo ( : ) for min on ice, before applying them to the plates for h at room temperature. , Ј, , Ј-tetramethylbenzidine substrate was used as a peroxidase substrate to visualize binding, after which the reaction was terminated using n h so . absorbances (a nm ) were measured in a fluostar omega (bmg labtech) microplate reader, and mars data analysis software was used for analysis. protein samples of each recombinant protein were measured at each concentration in triplicate. statistical analysis was performed by comparing each protein to the unmodified rbd using two-way anova with dunnett's multiple comparisons test where ␣ was set to . . the workflow is shown in fig. s . aliquots between and g of m , n a, and n a and g of the remaining proteins were digested with trypsin as per an and cipollo ( ) . approximately - -g aliquots of protease-digested proteins were processed for deglycosylated glycopeptide and permethylated glycan analyses. samples were resuspended in mm ammonium bicarbonate, ph . . glycans were released by digestion with units/l pngase f (glycerol-free from new england biolabs) for h at °c. the samples were adjusted to ph . with - l of mm hcl. to maximize glycan release, samples were further digested with . milliunits/l pngase a overnight at °c. free glycans and deglycosylated peptides were separated using c spe cartridges (thermo fisher scientific). intact glycopeptide analyses were performed using - g of hilic-enriched glycopeptides as per an and cipollo ( ) . following data collection on the trypsinized glycopeptides, the remainder of the m , n a, and n a sam- lc/ms e data were collected on trypsinized peptides deglycosylated with pngase f as described under n-glycan release. asparagines that are deglycosylated by pngase f are converted to aspartate with a mass gain of . da due to the replacement of -nh with -oh. the percent occupancy for each site is calculated by comparing the intensity of peptides with asn to those with asp. however, spontaneous deamidation of unmodified asn to asp can also occur. o-water, which results in mass shift of . da, was used to ensure calculated percent occupancy was not skewed due to spontaneous deamidation. this experiment allows for examination of both spontaneous and enzymatically catalyzed deamidation, and therefore, accurate estimations of percent occupancy of glycosites can be determined. percent occupancy was calculated by comparing the intensities of the deglycosylated (dg) and nonglycosylated (ng) peptides using the equation: dg/(dg ϩ ng)⅐ . pngase-released n-glycans were applied to c spe and eluted with . % formic acid leaving the deglycosylated peptides bound to the c column. the glycan eluate fractions were combined, and butanol was added to a final concentration of %. the samples were then loaded onto -mg porous graphite columns prepared first by sequential washes of ml of % acetonitrile (acn), ml of % acn in water, ml of % acn in water, and ml of water. all solutions contained . % trifluoroacetic acid (tfa). the loaded columns were washed three times with ml of . % tfa in water, then eluted with % acn, . % tfa, water, followed by % acn, . % tfa, and water. the eluents were pooled and dried in glass vials by rotary evaporation. permethylation was done following the method of cincanu and costello ( ) and cincanu and kerek ( ) . maldi-tof analysis of permethylated n-glycans was performed on a bruker autoflex tm speed mass spectrometer in positive polarity reflectron mode. , -dihydroxybenzoic acid was used as a matrix, and malto-oligosaccharides were used as an external calibrant. data were processed using flexanalysis tm . each sample was spotted three times, and scans were collected in positive reflectron mode. peaks were picked and assigned, and intensities were averaged across each set of spots using in-house software. assignments were based on glycans known to be present in hek t cells. each peptide or glycopeptide sample was analyzed three times. a c column (beh nanocolumn m inner diameter ϫ mm, . -m particle, waters corp.) was used for nanolc/ms e analyses. a nanoacquity uplc system (waters corp.) was used for automatic sample loading and flow control. load buffer was % acn, % water. peptides were eluted via a -min gradient from to % acn with a flow of . l/min. all chromatography solutions included . % formic acid. the eluent flowed to an uncoated -m inner diameter picotip emitter (new objective inc., woburn, ma). the mass spectrometer was a synapt g hdms system (waters corp.). applied source voltage was v. data were collected in positive polarity mode using data-independent ms e acquisition, which consists of a starting -v scan followed by a scan ramping from to v in . s. to calibrate internally, every s fmol/l glu-fibrinopeptide b with pmol/l leucine enkephalin in % acetonitrile, . % formic acid, . % water was injected through the lockmass channel at a flow rate of nl/min. initial calibration of the mass spectrometer was performed in ms mode using glu-fibrinopeptide b and tuned for a minimum resolution of , full-width at half-maximum. nanolc/ms e data were processed using biopharmalynx . (waters corp.) and glymps (in-house software) ( , ) to identify specific glycans on each peptide. the search settings included trypsin digest with up to one missed cleavage, fixed cysteine carbamidomethylation, variable methionine oxidation, and variable n-glycan modifications based on a building block glycan library. assignment inclusion criteria were as follows: ) the presence of a core fragment (peptide, peptide ϩ hexnac, peptide ϩ hexnac , peptide ϩ dhex hexnac , and peptide ϩ hex hexnac ); ) the presence of three or more peptide fragments; ) the presence of three or more assigned glycopeptide fragments; ) assignment is made in at least of injections; and ) the existence of the glycan in glyconnect (https://glyconnect.expasy.org). residues - of the m spike em structure were extracted from the published structure (pdb code cv ) ( ) . this corresponds to the m -rbd used in this paper. glycamweb's glycoprotein-builder program ( ) was used to add the major oligosaccharide found at each glycosylation site onto the protein in silico. all glycosites in the m em structure were occupied except asn- ; however, asn- was occupied in our data and was populated accordingly. all glycosites were glycosylated in the new pdb file based on best evidence from our ms data. the coordinates of m -rbd without glycans, m -rbd with modeled glycans, and bovine rbd (pdb code h ) were used in docking experiments. a virtual library of oligosaccharides representing a variety of binding epitopes was created based on the cfg array version . (see fig. for a list). raw models of the oligosaccharide ligands were created with the amber tool tleap (www.ambermd.org) utilizing the glycam force field ( ), then energy minimized using yasara ( ) . dock screening of the library was performed with the yasara implementation of autodock vina ( ) with default parameters. a molecular dynamics simulation with explicit water (tp ) but with fixed coordinates for the backbone atoms was run on the glycosylated m rbd model to allow the amino acid side chains to accommodate the added glycans and to find low energy conformations. two models were extracted from the glycosylated md rbd run at and ns, which were used for dock screening with the virtual library. each oligosaccharide ligand was docked against the structures times. docking results shown in fig. are for the -ns model. results were similar in the -ns models. the long view: years of infectious bronchitis research the coronavirus spike protein is a class i virus fusion protein: structural and functional characterization of the fusion core complex coronaviridae the avian coronavirus spike protein cloning and sequencing of the gene encoding the spike protein of the coronavirus ibv coronavirus ibv: structural characterization of the spike protein coronavirus ibv glycopolypeptides: size of their polypeptide moieties and nature of their oligosaccharides assembly of coronavirus spike protein into trimers and its role in epitope expression quaternary structure of coronavirus spikes in complex with carcinoembryonic antigen-related cell adhesion molecule cellular receptors mapping of the receptor-binding domain and amino acids critical for attachment in the spike protein of avian coronavirus infectious bronchitis virus binding of avian coronavirus spike proteins to host factors reflects virus tropism and pathogenicity sialic acid is a receptor determinant for infection of cells by avian infectious bronchitis virus glycan-protein interactions in viral pathogenesis identification of n-linked glycosylation sites in the spike protein and their functional impact on the replication and infectivity of coronavirus infectious bronchitis virus in cell culture the pathogenesis of virulent and avirulent avian infectious bronchitis virus protein secondary structure analyses from circular dichroism spectroscopy: methods and reference databases novel receptor specificity of avian gammacoronaviruses that cause enteritis cryo-em structure of infectious bronchitis coronavirus spike protein reveals structural and functional evolution of coronavirus spike proteins crystal structure of bovine coronavirus spike protein lectin domain the rhesus rotavirus vp sialic acid binding domain has a galectin fold with a novel carbohydrate binding site unique carbohydrate-carbohydrate interactions are required for high affinity binding between fc␥riii and antibodies lacking core fucose model system for cell adhesion mediated by weak carbohydrate-carbohydrate interactions carbohydrate-carbohydrate interaction as a major force initiating cell-cell recognition tn and stn are members of a family of carbohydrate tumor antigens that possess carbohydrate-carbohydrate interactions are lewis b and h type on helicobacter pylori involved in binding of bacteria to muc mucin? adv variable selection method improves the prediction of protein secondary structure from circular dichroism spectra a self-consistent method for the analysis of protein secondary structure from circular dichroism an unbiased approach for analysis of protein glycosylation and application to influenza vaccine hemagglutinin elimination of oxidative degradation during the per-o-methylation of carbohydrates asimpleandrapidmethodforthepermethylation of carbohydrates glycosylation analysis of engineered h n influenza a virus hemagglutinins with sequentially added historically relevant glycosylation sites glycosylation characterization of an influenza h n hemagglutinin series with engineered glycosylation patterns: implications for avian coronavirus glycosylation - structure-function relationships glycam : a generalizable biomolecular force field. carbohydrates autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading presenting your structures: the ccp mg molecular-graphics software drawglycan-snfg: a robust tool to render glycans and glycopeptides with fragmentation information key: cord- -q yqnlyl authors: armijos-jaramillo, vinicio; yeager, justin; muslin, claire; perez-castillo, yunierkis title: sars-cov- , an evolutionary perspective of interaction with human ace reveals undiscovered amino acids necessary for complex stability date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: q yqnlyl the emergence of sars-cov- has resulted in more than , infections and nearly , deaths globally so far. this novel virus is thought to have originated from an animal reservoir, and acquired the ability to infect human cells using the sars-cov cell receptor hace . in the wake of a global pandemic it is essential to improve our understanding of the evolutionary dynamics surrounding the origin and spread of a novel infectious disease. one way theory predicts selection pressures should shape viral evolution is to enhance binding with host cells. we first assessed evolutionary dynamics in select betacoronavirus spike protein genes to predict where these genomic regions are under directional or purifying selection between divergent viral lineages at various scales of relatedness. with this analysis, we determine a region inside the receptor-binding domain with putative sites under positive selection interspersed among highly conserved sites, which are implicated in structural stability of the viral spike protein and its union with human receptor hace . next, to gain further insights into factors associated with coronaviruses recognition of the human host receptor, we performed modeling studies of five different coronaviruses and their potential binding to hace . modeling results indicate that interfering with the salt bridges at hot spot could be an effective strategy for inhibiting binding, and hence for the prevention of coronavirus infections. we also propose that a glycine residue at the receptor binding domain of the spike glycoprotein can have a critical role in permitting bat variants of the coronaviruses to infect human cells. the recent emergence of the novel sars coronavirus (sars-cov- ) marked the third introduction of a highly pathogenic coronavirus into the human population in the twenty-first century, following the severe acute respiratory syndrome coronavirus (sars-cov) and the middle east respiratory syndrome coronavirus (mers-cov). the first, sars-cov emerged in november in the guangdong province of china and spread globally during - , infecting more than people and causing deaths (drosten et al., ; who, ) . mers-cov was the second emergence and was first detected in saudi arabia in and resulted in nearly human infections and deaths in countries (fehr et al., ; zaki et al., ) . in december , sars-cov- , a previously unknown coronavirus capable of infecting humans was discovered in the chinese city of wuhan, in the hubei province zhu et al., ) . sars-cov- is associated with an ongoing pandemic of atypical pneumonia, now termed coronavirus disease (covid- ) that has affected over , people with fatalities as of march , (who, . both sars-cov and mers-cov are thought to have originated in colonies of bats, eventually transmitted to humans, putatively facilitated by intermediate hosts such as palm civets and dromedary camels, respectively (cui et al., ) . the genome of sars-cov- shares about % nucleotide identity with that of sars-cov and is % identical to the bat coronavirus batcov ratg genome, reinforcing the probable bat origin of the virus . however, better assessing the evolutionary dynamics of sars-cov- is an active research priority worldwide. sars-cov, mers-cov and sars-cov- belong to the genus betacoronavirus within the subfamily coronavirinae of the family coronaviridae. members of this family are enveloped viruses containing a single positive-strand rna genome of - kb in length, the largest known rna virus genome. the coronavirus spherical virion consists of four structural proteins: the spike glycoprotein (s-protein), the envelope protein, membrane protein and nucleocapsid. the transmembrane trimeric s-protein plays a critical role in virus entry into host cells (gallagher & buchmeier, ; tortorici & veesler, ) . it comprises two functional subunits: s subunit, where the receptor-binding domain (rbd) is found, is responsible for binding host cell surface receptors and s subunit mediates subsequent fusion between the viral and cellular membranes (kirchdoerfer et al., ; yuan et al., ) . both sars-cov and sars-cov- interact directly with angiotensin-converting enzyme (ace ) to enter host target cells (hoffmann et al., ; li et al., ; walls et al., ; yan et al., ) . in the case of sars-cov, ace binding was found to be a critical determinant for the virus host range and key amino acid residues in the rbd were identified to be essential for ace -mediated sars-cov infection and adaptation to humans (li et al., ; li et al., ) . understanding the dynamics that permits a virus to shift hosts is of considerable interest, and further be an essential preliminary step towards facilitating the development of vaccines and the discovery of specific drug therapies. we employ a multidisciplinary approach to look for evidence of diversifying selection on the s-protein gene, and model the interactions between human ace (hace ) and the rbd of selected coronavirus strains, which ultimately afforded us novel insights detailing virus and host cell interactions. given the rapid pace of discovery we aim to add clarity to evolutionary dynamics of diseases strains by more precisely understand the dynamics at the s-protein and its interaction with hace . the most similar genomes to sars-cov- mn were retrieved using blastp (altschul et al., ) vs the nr database of genbank (table ) . genomes were then aligned using mauve (darling et al., ) and the s-protein gene was trimmed. the extracted genomic sections were aligned using a translation align option of geneious (kearse et al., ) with a mafft plugin (katoh & standley, ) . the phylogenetic reconstruction of s-protein genes was performed with phyml (guindon et al., ) , using a gtr+i+g model, using non-parametric bootstrap replicates. both, the alignment and the tree were used as input for paml codeml (yang, ) . the presence of sites under positive selection was tested by the comparison of m (it allows a proportion of positive, neutral and negative selection sites in the alignment) vs m (it allows a proportion of neutral and negative selection sites in the alignment) and m (ω follows a beta distribution plus a proportion of sites with ω> ) vs m (ω follows a beta distribution) models using the ete toolkit . (huerta-cepas et al., ) . the presence of tree nodes under positive selection was obtained with the free branch model and then tested by the comparison of branch free (different ω for each selected branches) vs m (negative selection for all sites and branchesnull model) and branch free vs branch neutral (ω= for selected branches) models. the presence of sites with positive selection under specific branches of the tree was tested with bsa (proportion of sites with positive selection in a specific branch of the tree) vs bsa (proportion of sites with neutral and purifying selection in a specific branch of the tree) models. likelihood ratio test (lrt) was performed (p≤ . ) to compare the hypothesis contrasted by each model. we used the set of programs available in hyphy (kosakovsky pond et al., ) , fast unconstrained bayesian approximation (fubar) to detect overall sites under positive selection, and fixed effects likelihood (fel) to detect specific sites under positive selection in specific branches. we used mixed effects model of evolution (meme) to detect episodic positive/diversifying selection and adaptive branch site rel (absrel) to detect branches in the tree under positive selection. the web server datamonkey (weaver et al., ) was used to perform the hyphy analyses. finally, treesaap . (woolley et al., ) was used to detect sites under adaptation (in terms of physicochemical properties). the same alignment and tree described above were used for this analysis. all these experiments were performed again using the s-protein genes of a shorter list of accessions and more distantly related (broad dataset) to sars-cov- (ay , ay , dq , fj , ky , mg , mg , mn , nc_ ) to test the reproducibility of the predicted branches and sites under positive selection. the crystal structure of the sars-cov s-protein rbd ( genebank id nc_ ) in complex with hace was retrieved from the protein data bank (code ajf) (berman et al., ) . homology models were constructed using this structure as template for the rbds of sars-cov- (sars , genebank id mn ), the bat sars-like coronavirus isolate rm (rm , genebank id dq ) and the bat sars-like coronavirus isolate rs ( rs , genebank id ky ). one additional homology model for the g d mutant of the sars-cov- rbd (sars -mut) was constructed. homology models were built with modeller v. (webb & sali, ) using its ucsf chimera interface (pettersen et al., ) . five models were constructed for each target sequence and the one with the lowest dope score was selected for the final model. all non-amino acidic residues were removed from the sars-cov rbd-hace complex to obtain a clean complex. the homology models of the sars , rm , rs rbds and sars -mut were superimposed into the sars-cov rbd to obtain their initial complexes with hace . these complexes were then subject to molecular dynamics (md) simulations and estimation of their free energies of binding using amber (case et al., ) . for the later, ace was considered as the receptor and the rbds as ligands. the protocol described below was employed for all complexes and otherwise noted default software parameters were employed. systems preparation was performed with the tleap program of the amber suite. each complex was enclosed in a truncated octahedron box extending Å from any atom. next, the boxes were solvated with tip p water molecules and na+ ions were added to neutralize the excess charge. systems were minimized in two steps, the first of which consisted in steps of the steepest descent algorithm followed by cycles of conjugate gradient with protein atoms restrained using a force constant of kcal/mol.Å . the pme method with a cutoff of Å was used to treat long range electrostatic interactions. during the second minimization step the pme cutoff was set to Å and it proceeded for steps of the steepest descent algorithm followed by cycles of conjugate gradient with no restrains. the same pme cutoff of Å was used in all simulation steps from here on. both minimization stages were performed at constant volume. the minimized systems were heated from to k at constant volume constraining all protein atoms with a force constant of kcal/mol.Å . the shake algorithm was used to constrain all bonds involving hydrogens and their interactions were omitted from this step on. heating took place for steps, with a time step of fs and a langevin thermostat with a collision frequency of . ps - was employed. all subsequent md steps utilized the same thermostat settings. afterward, the systems were equilibrated for ps at a constant temperature of k and a constant pressure of bar. pressure was controlled with isotropic position scaling with a relaxation time of ps. the equilibrated systems were used as input for ns length production md simulations. the free energies of binding were computed under the mm-pbsa approach implemented in ambertools (case et al., ) . a total of md snapshots were evenly selected, one every ps, from the last ns of the production run for mm-pbsa calculations. the ionic strength was set to mm and the solute dielectric factor was set to for all systems. in order to detect branches and sites under positive/negative selection, two datasets were explored. the first ('closer' dataset) harbors the most similar genomes to wuhan-hu- coronavirus (sars-cov- ) (mn ). for this dataset, several genomes were excluded from the analysis because they showed minimal variation to other sequences. we used a preliminary phylogeny to select a representative isolate of each clade (table ) in order to exclude highly similar sequences. the second dataset ('broad' dataset) includes some accessions of the first dataset plus isolates less related to sars-cov- , like sars-like coronavirus isolates from different countries (see methods). we compare the results of two dataset because the phylogenetic distance between orthologues in a given dataset has been demonstrated to alter the ability to detect selection in paml and meme (mcbee et al., ) . in both datasets, we observed evidence of purifying selection in the majority of nodes of the tree. specifically, in the 'closer' dataset we identified nodes with evidence of negative selection, and under positive selection when free ratios model of codeml model was applied. to confirm the four nodes under positive selection we use ltr test for contrasting hypothesis using branch free, branch neutral and m models of codeml. using these approximations, any node predicted by free ratios model with ω> was significantly different to the purifying (ω< ) or neutral (ω= ) models. an equivalent analysis was performed using absrel of hyphy, observing episodic diversifying selection in at least of nodes of the phylogenetic tree reconstructed with the 'closer' dataset ( figure ). interestingly, one of the divisions detected with diversifying selection was the branch that contains sars-cov- , pangolin coronavirus isolate mp and bat coronavirus ratg (called sars-cov- group) but not the specific branch that contains sars-cov- . under positive selection in sars-cov- using the closer dataset without pangolin coronavirus isolate mp . it is interesting despite the influence of the dataset in the results, because site f is directly involved in hace -rbd interaction , explaining at least in part strong selection at this site. moreover, the branch-site model bsa (positive selection) vs bsa (relaxation) of codeml were compared to find evidence of sites under positive selection in branch of sars-cov- using the 'closer' dataset, but bsa does not show significant differences with bsa (p> . ) indicating selection cannot be confidently implicated, but it was when other datasets were used (including f ). in summary, we do find evidence of sites under positive/episodic selection in branches of close related strains of wuhan-hu- isolate coronavirus. however, there is not strong evidence of specific sites under positive selection in sars-cov- using the tools mentioned in this work. this result does not disregard the presence of positive selection sites in sars-cov- , nonetheless, it shows the limitation of the methods to identify with precision specific sites under positive selection in a precise taxon of a phylogenetic tree. we further warn researchers need to be conservative with interpretations of studies utilizing these methodologies, given the equivocal results can be generated by datasets varying in genetic similarity. to complement our analyses looking for evidence of selection among lineages, we specifically analyzed for patterns of selection across sites in the s-protein genes, we used the sites models available in codeml and hyphy. model m of codeml detected . % of sites under positive selection (ω> ) and models m and m detected % of sites under purifying selection (ω< ). model m explains the significant data better (p= e- ) than m model, that takes in account only sites with neutral and purifying selection. to resolve these ambiguities in positive selection sites we calculate putative selection sites with codeml (using bayes empirical bayes from m and m models) and fubar with different datasets reflecting the addition of novel sequences to online repositories (broad, closer, closer without mn and mt and closer without mt ) and we obtain different results. it is becoming increasingly clear that predictions of positive selected sites are highly influenced simply by the diversity of the individual sequences included in the datasets. in any case, the majority of predicted sites converge in the region between to , a section of the rbd. additionally, we used treesaap to detect important biochemical amino acid properties changes over regions and/or sites along betacoronavirus s-protein. using a sliding window size of (increasing by ) we detect that the region between to (using sars-cov- s-protein as a reference) have drastic amino acid changes for alpha-helical tendencies. in addition, the section between to residues registers radical changes in amino acids implicated in the equilibrium constant (ionization of cooh). in the structural analysis we performed, the section between to forms a loop that is not present in certain s-proteins of coronavirus isolated in bats. this loop extends the interaction area between rbd of s-protein and human ace , in fact, the lack of these loop decreases the negative energy of interaction (increasing the binding) among these two molecules (see table ). these results obtained from independent analysis strongly highlight the importance of to section. additionally, important hace -binding residues in the rbd from sars-cov- obtained from the crystallography and structure determination performed by shang et al. ( ) are also present in the section we highlight here. we propose that this region is the most probable to contain the sites under positive selection due to predictions by our codeml and fubar models. in that sense, we refer to this section as region under positive selection (rps). it is important to additionally clarify that even inside the rps we found at least aa highly conserved between coronaviruses, several of them are predicted as sites under purifying selection. this shows that it is necessary to maintain sites without change around polymorphic sites, probably to conserve the protein structure and at the same time to have the ability to colonize more than one host. interestingly, the rps of the pangolin coronavirus isolate mp differs only in one amino acid with the homologous region of sars-cov- , whereas in contrast the bat coronavirus ratg (the overall most similar isolate to sars-cov- sequenced at the moment) shows differences in the same region. several explanations could derive from this observation. the hypothesis of recombination inside the pangolin between a native coronavirus strain and a bat coronavirus (like ratg ) is congruent with our observation. this scenario was proposed and discussed as the origin of sars-cov- by (lam et al., ; wong et al., ; xiao et al., ) , however, other explanations are possible. if the sars-cov- , ratg and pangolin coronavirus mp isolate are closely related as shown in the tree of the figure , we are observing the ancestral sequence of rps in human and pangolin coronaviruses, and a mutated version in bat virus. elucidating the origin of sars-cov- is beyond the scope of this work, nevertheless sequencing of new coronavirus isolates in the near future could resolve this question. with a list of broader observations related to the role of selection across viral genomes we aimed to specifically understand how these regions could affect virus/host interactions. to understand more in deep the importance of rps in the evolution of sars-cov- , we quantified the relative importance of this region in the interaction between rbd and hace . in that sense, md simulations were run for five complexes (listed in methods). in all cases the systems were stable with root mean square deviations (rmsd) of their backbones between . Å and . Å relative to the initial complexes structures during the last ns of the production run. we first investigated the network of contacts between the ligands (coronaviruses rdb) with the receptor (hace ). overall, all complexes present a large number of contacts between the ligands and the receptor in at least % of the md snapshots selected for mm-pbsa calculations. common interactions with t , f , k , h , y , k , g , d and r of the receptor are observed in all systems. the full networks of interactions between the coronaviruses and the hace receptor are provided as supporting information. next we estimated the free energies of binding of the coronaviruses' rbds to hace and the results of these evaluations are summarized in table . these calculations show that the sars , sars and rs viruses are predicted to favorably bind to the human hace receptor, while the rm and sars -mut variants present unfavorable free energies of binding. the fact that the bat's coronavirus rs , in addition to sars and sars , presents favorable interaction with hace is in accordance with the previous observation that it is able to infect human cells expressing this protein (hu et al., ) . to get more insights into the contribution of the receptor and the rbds to the binding process, we performed energy decomposition experiments. the contribution of each residue in the studied coronaviruses that interact with the hace receptor are shown in table . rows are presented in such a way that each of them contains the residues occupying the same position in the viruses rbds structures as in the sar rbd structure. from here on, residues numeration will take that of sars as reference. in general, most rbds residues show negative values of contribution to the free energies of binding to the human receptor. all studied rbds, except that of the rm coronavirus, have amino acids with large favorable contributions to the free energies of binding that directly interact with hace : k of sars and sars -mut, r in sars and r in rs . on the other hand, the g d mutation (d present in bat coronavirus strains) have a negative contribution to the binding of the rdb to hace . this site was predicted to be under purifying selection by fubar analyses, and is located within the rps. strikingly, the g d mutation (sars numeration) has a large negative influence in the free energy of binding in the two complexes that contain it. it is also worth noting that the three aspartic acid substitutions present in all systems negatively contribute to the systems stability. taking into account that the only difference between sars and sars -mut is the g d mutation, we postulate that this rbd position is critical for the human receptor recognition by coronaviruses. to the best of our knowledge, no coronavirus having aspartic acid at this position is able to infect human cells. this result supports the prediction from fubar analyses indicating that the site g d is under purifying selection. combined, our results strongly suggest that the mutation of the d residue present in the coronaviruses from bats is critical for their rbds to recognize the human hace receptor. additionally, it shows the importance of sites under purifying selection in rps for the rbd evolution. to better interpret the influence of the key interactions between the coronaviruses rbds and their hace receptor, their interactions were analyzed. to select the representative structure of each system the md snapshots employed for mm-pbsa calculations were clustered. then, the representative structure of a system was selected as the centroid of the most populated cluster. the predicted rbd-hace complexes for sars , sars and sars -mut are depicted in figure . many studies have focused on coronaviruses mutations that favor adaptations for human hosts infections. for example, it has been shown that specific substitutions at positions , , , and ( , , , and in sars) of the rbd of sars favors the interaction between the rbd of sars and hace (cui et al., ) . likewise, homology modeling studies found favorable interactions between the residues occupying these positions in the sars rbd and the human receptor . the cornerstone of these favorable interactions is the complementarity of the rbds with hot spots and . these are salt bridges between k and e and between d and k of ace which are buried in a hydrophobic environment (see figure. ). in the cases of sars and sars, q (n in sars) and n (t in sars) add support to the hot spots according to these previous studies. these observations should also hold for the rs strain, however the n a change in the later compared to sars (a in rs ) add little support to hot spot . in this case, to continue permitting human infection, the large favorable contribution of r in rs to the free energy of binding could compensate the weak support provided by a to hot spot . interestingly, k is the residue forming the largest network of contacts with the analyzed rbds among those belonging to both hot spots. our simulations also show that in sars and sars the rbd amino acids with the largest contribution to the free energy of binding, k and r (see table ) respectively, do not interact with any hot spot residue. instead, they interact with d of hace in the sars complex and with e of the human receptor in the sars complex. this could indicate that interactions additional to those previously identified with the hace hotspots could be critical for the stabilization of the rdb-human receptor complexes. finally, we analyzed the possible reasons for the predicted negative impact that the g d mutation has on the predicted free energies of binding of the rbd to hace . as depicted in figure , g directly interacts with k in hot spot and its mutation interferes with the d -k salt bridge. specifically, d of the rdb point to d of hace yields a high electric repulsion between these amino acids. consequently, this portion of the rbd is pushed to a position further from hace than that observed in the wild type receptor, resulting in the reduction of its network of contacts with k . as a result, the binding of the rbd to hace is considerably inhibited and unlikely to occur. a priority in ongoing research is to better understand coronavirus evolution, with specific interests in understanding the role of selection pressures in viral evolution, and clarifying how viral strains can infect novel hosts. our experiments suggest that there are sites under positive selection in the s-protein gene of sars-cov- and other betacoronaviruses, particularly in a region that we called rps (region under positive selection) inside of the rbd. however, we have identified that by in large, sites in this region (and overall, in the s-protein gene) are under purifying selection. particularly, for the site d g, the presence of aspartic acid seems indispensable for the interaction with the hace . additionally, we performed md simulations and free energies of binding predictions for five different complexes of coronaviruses that do and do not infect human cells. our results suggest that as long as no disrupting interference occur with both salt bridges at hot spots and coronaviruses are able to bind with hace . modeling results suggest that interference with the hot spot could be and effective strategy for inhibiting the recognition of the rbd of the sars-cov- spike protein by its human host receptor ace and hence prevent infections. although additional simulations and experiments are required, all evidence suggests that the mutation of d in the bat variants of the coronaviruses permit infection of human cells. giving the large contribution of sars k to the free energy of binding of the rbd to hace we propose that blocking its interaction with the receptor d could be a promising strategy for future drug discovery efforts. gapped blast and psi-blast: a new generation of protein database search programs the -new coronavirus epidemic: evidence for virus evolution the protein data bank origin and evolution of pathogenic coronaviruses mauve: multiple alignment of conserved genomic sequence with rearrangements identification of a novel coronavirus in patients with severe acute respiratory syndrome middle east respiratory syndrome: emergence of a pathogenic human coronavirus coronavirus spike proteins in viral entry and pathogenesis new algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml . . systematic biology sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor discovery of a rich gene pool of bat sars-related coronaviruses provides new insights into the origin of sars coronavirus clinical features of patients infected with novel coronavirus in ete: a python environment for tree exploration more effective purifying selection on rna viruses than in dna viruses mafft multiple sequence alignment software version : improvements in performance and usability geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data pre-fusion structure of a human coronavirus spike protein hyphy . -a customizable platform for evolutionary hypothesis testing using phylogenies identification of -ncov related coronaviruses in malayan pangolins in southern china animal origins of the severe acute respiratory syndrome coronavirus: insight from ace -s-protein interactions angiotensin-converting enzyme is a functional receptor for the sars coronavirus receptor and viral determinants of sars-coronavirus adaptation to human ace the effect of species representation on the detection of positive selection in primate gene data sets ucsf chimera-a visualization system for exploratory research and analysis structural basis for receptor recognition by the novel coronavirus from wuhan on the origin and continuing evolution of sars-cov- structural insights into coronavirus entry structure, function, and antigenicity of the sars-cov- spike glycoprotein receptor recognition by novel coronavirus from wuhan: an analysis based on decade-long structural studies of sars datamonkey . : a modern web application for characterizing selective and other evolutionary processes comparative protein structure modeling using modeller summary of probable sars cases with onset of illness from evidence of recombination in coronaviruses implicating pangolin origins of ncov- treesaap: selection on amino acid properties using phylogenetic trees isolation and characterization of -ncov-like coronavirus from malayan pangolins structural basis for the recognition of the sars-cov- by full-length human ace paml : phylogenetic analysis by maximum likelihood cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains a pneumonia outbreak associated with a new coronavirus of probable bat origin a novel coronavirus from patients with pneumonia in china the authors declare that they have no conflicts of interest. key: cord- -qpztmdnw authors: guo, jingxu; douangamath, alice; song, weixiao; coker, alun r.; edith chan, a.w.; wood, steve p.; cooper, jonathan b.; resnick, efrat; london, nir; von delft., frank title: in crystallo-screening for discovery of human norovirus c-like protease inhibitors date: - - journal: j struct biol x doi: . /j.yjsbx. . sha: doc_id: cord_uid: qpztmdnw outbreaks of human epidemic nonbacterial gastroenteritis are mainly caused by noroviruses. viral replication requires a c-like cysteine protease ( cl(pro)) which processes the kda viral polyprotein into six functional proteins. the cl(pro) has attracted much interest due to its potential as a target for antiviral drugs. a system for growing high-quality crystals of native southampton norovirus cl(pro) (sv cp) has been established, allowing the ligand-free crystal structure to be determined to . Å in a tetrameric state. this also allowed crystal-based fragment screening to be performed with various compound libraries, ultimately to guide drug discovery for sv cp. a total of fragments were found to bind to the protease out of the which were screened. two of the hits were located at the active site of sv cp and showed good inhibitory activity in kinetic assays. another were found at the enzyme’s putative rna-binding site and a further were located in the symmetric central cavity of the tetramer. gastroenteritis accounts for the deaths of over , children every day worldwide, making it the second leading cause of death for children under the age of , more than the combination of aids, malaria and measles (liu et al., ) . whilst there are many other causes of gastroenteritis, including parasites, bacteria and viruses, human caliciviruses are recognised as the leading cause of gastroenteritis worldwide among people of all ages. the caliciviridae family contains five genera known as norovirus, vesivirus, nebovirus, sapovirus and lagovirus (clarke et al., ) with norovirus being the most common cause of disease in humans (lambden et al., ) . noroviruses account for more than % of gastroenteritis cases and at least % of nonbacterial acute gastroenteritis cases worldwide, as reported by the centers for disease control and prevention in the us ( ). scallan et al. ( ) estimated that % of all viral foodborne illness incidents are caused by noroviruses which corresponds to . million per year in the us alone. from to , around . % of norovirus cases needed longterm care facilities in order to control the transmission (vega et al., ) . statistics are generally similar in europe (baert et al., ; phillips et al., ) . globally, it is estimated that noroviruses lead to a total of $ . billion in direct health system costs and $ . billion in social cost per year (bartsch et al., ) . clinical treatment and intervention is hampered by the lack of licensed vaccines or antivirals. treatment with human immunoglobulin did show some benefit but did not result in clearance of the virus (florescu et al., ) . whilst development of a vaccine has been hindered by the lack of small-animal models and cell culture systems, a number of norovirus vaccines are yielding promising results in clinical trials noroviruses are genetically classified into genogroups, gi -gvii, based on the amino acid sequence of the vp capsid protein and are further segregated into at least genotypes (vinjé, ) . noroviruses from groups gi (like southampton virus) and gii infect humans, as do members of the giv. subgroup. gii viruses are the most frequently detected ( %) while gii. are the major cause of norovirus outbreaks worldwide (siebenga et al., ). many noroviruses have been reported such as norwalk virus (jiang et al., ) , hawaii virus (lew et al., a) , snow mountain virus (lochridge and hardy, ) , desert shield virus (lew et al., b) , southampton virus (clarke and lambden, ) and lordsdale virus (lambden et al., ) . the norovirus genome consists of a single-stranded positive-sense rna of . - . kb in length and contains three open reading frames (orfs) (lambden et al., ) , except for the murine norovirus which has a fourth alternative orf (mcfadden et al., ) . orf encodes a kda non-structural polyprotein which is co-and post-translationally cleaved into six or seven non-structural proteins by the viral c-like protease (ns ). the seven products of this proteolysis are, from n-terminus to c-terminus: p (ns - ), an ntpase (ns ), a a-like protein (p , ns ), a viral genome-linked protein (vpg, ns ), the c-like protease ( cl pro , ns ) and an rna-dependent rna polymerase (rdrp, ns ) (blakeney et al., ) . orf and orf encode the capsid protein vp and the minor structural protein vp , respectively. the c-like protease ( cl pro ) was named because of its similarity to the picornavirus c protease. it is a cysteine protease which shows a typical chymotrypsin-like fold containing two domains: a β-barrel domain and a β-sheet domain separated by a groove where the active site is located (bazan and fletterick, ; boniotti et al., ) . the active site is characterised by a catalytic dyad (cys -his ) (someya et al., ) or triad (cys -his -glu ) (tiew et al., ) and shows a strong preference for a -d/e-f/y-x-l-q-g-p-(x can be h, q, or e) sequence corresponding to the subsites s -s -s -s -s -s '-s ' (tiew et al., ) . studies have indicated that norovirus cl proteases have a preferential order of processing the polyprotein, for example, the southampton virus cl pro has a preference for cleavage at lq-gp and lq-gk, but it can also cleave at me-gk, fe-ap and le-gg (hussey et al., ) . although several norovirus cl pro structures have been determined (hussey et al., ; nakamura et al., ; zeitler et al., ) , the full structural basis of how these enzymes recognise these different sites is still unknown. the key role of norovirus cl pro in the processing of the polyprotein and the absence of homologues in the human host make it an excellent target for antiviral drug discovery. there is currently no clinically approved norovirus cl pro inhibitor available but several compounds have been reported with strong inhibitory activity against cl proteases in vitro. these are usually peptidyl or macrocyclic compounds mimicking the substrate sequence whilst possessing a transition state analogue (damalanka et al., ; kankanamalage et al., ; mandadapu et al., ) . examples include peptidyl aldehydes and α-ketoamides which showed strong inhibition of norovirus cl pro , and the c or c-like proteases in picornaviruses and coronaviruses in cell-based assays . the aldehydes and α-ketoamides act as warheads which form a reversible adduct with the catalytic residue cys in the active site . these compounds are named as latent transition state (ts) inhibitors. ts mimics, such as α-hydroxyphosphonate, are converted to the aldehyde form either with or without catalytic action of the enzyme and form a tetrahedral adduct with the cys residue (kankanamalage et al., ) . hussey et al. ( ) first reported the x-ray structure of the southampton norovirus cl pro (sv cp) with an inhibitor bound. this compound consisted of part of the most rapidly cleaved substrate sequence (efqlq) with a michael acceptor moiety linked to the s residue gln. this is attacked by cys and a covalently bound complex is formed. interestingly, the his sidechain is pushed away by the inhibitor, which disrupts the catalytic triad. screening by mass-spectrometry for covalent inhibitors of sv cp has been described by us previously (resnick et al., ) . in this work we have crystallised the protease in its native form with an unperturbed catalytic triad and have conducted crystal-based fragment screening of compounds with the aim of discovering novel inhibitory functional groups which have the potential to be developed as therapeutic agents, either on their own or through chemical coupling. a total of compounds were found to bind to cl pro in the crystals and two of them were located in the active site while another were located at the enzyme's putative rna-binding site. a further compounds were found to bind in the central cavity of this putative tetrameric form of the enzyme. expression and purification of sv cp was conducted using the method described by hussey et al., ( ) . screening for crystallisation conditions for sv cp was accomplished using the sitting-drop method at °c with the screening kits: structure screen & , jcsgplus, pact premier, midas and morpheus from molecular dimensions (suffolk, uk). a ttp labtech mosquito crystal screening robot (ttp labtech, hertfordshire, uk) was used to dispense nl of the protein, at concentrations of mg/ml and mg/ml, with nl of the corresponding well solution into each drop. high quality crystals were obtained in . m ammonium citrate and % (v/v) peg after approximately one week, although crystals kept appearing over the next - months prior to screening. selected crystals were cryo-protected in % glycerol and mounted in loops before flashcooling. x-ray data were collected at beamline i - at diamond light source (dls, didcot, england). fine-sliced data were collected as guided by the strategy suggested by the program edna (incardona et al., ) . data were processed automatically by the program xia (winter, ) at dls, which revealed the space group to be c , as shown in table . further analysis using phenix.xtriage (zwart et al., ) suggested that the data were of good quality. the solvent content of this crystal form was estimated to be . % using matthews_coef (kantardjieff and rupp, ) . several rounds of manual rebuilding and correction were performed using coot (emsley and cowtan, ) followed by restrained refinement using refmac (murshudov et al., ) and phenix.refine (afonine et al., ) . since the crystal diffracted to near atomic resolution, the temperature factors were refined anisotropically. structure validation was performed with molprobity (chen et al., ). the statistics for data collection, data processing and refinement are shown in table . (v/v) of , %, %, % and %, and on soaking time scales of h, h and overnight. in order to make the experiment more efficient, the crystals were also tested with and without additional cryo-protectant for data collection. it was found that these crystals could survive in % dmso for many hours and additional cryo-protection was not required. the plates containing crystals were imaged using a rock imager system (formulatrix, usa ). all the crystals were then ranked using the program texrank (ng et al., ) and positional coordinates for the injection of the fragments were manually defined in the drop. each fragment from the dslp library ( fragments) (cox et al., ) fragment soaking was conducted in batches to give an average soaking time of approximately . hours prior to crystal mounting. crystal harvesting was aided by the use of a crystallisation plate shifter (oxford lab technologies, oxford, uk). all the crystals were mounted in loops of about the same size as the crystals or slightly smaller to allow for automated, unattended data collection in which the x-ray beam was aimed at the centre of each loop. a total of ° of data were collected for each crystal, taking approximately seconds per crystal using dls beamline i - . the data produced were managed using xchemexplorer (krojer et al., ) which gathered ligand information and data processing results and launched different software pipelines, such as dimple (wojdyr et al., ) pandda (pearce et al., ) for further analysis and hit identification. pandda uses an average of several ground-state crystal structures to calculate a background density correction which reveals better electron density for weakly bound fragments. all the hits were checked visually by using the program pandda.inspect in the pandda suite (pearce et al., ) . the hits were further refined using refmac (murshudov et al., ) followed by inspection using coot (emsley and cowtan, ) for several rounds (table ). in most cases anisotropic b-factor refinement was undertaken and the fragment occupancy was fixed. confirmatory omit maps for the ligands were generated using the program composite omit map (terwilliger et al., ) in the phenix program suite (adams et al., ) . interactions between ligands and sv cp were analysed using ligplot + (wallace et al., ) . the protease ( . mg/ml final concentration) in a buffer containing mm tris, ph . , and mm β-mercaptoethanol was mixed with the fragment (dissolved in dmso at concentrations of . , . , . , . and . mm) for min at rt. the solution was then mixed with the chromogenic substrate (ac-efqlq-para-nitroaniline; peptide protein research ltd, southampton, uk), which was dissolved in dmso to give final concentrations of . , . , . , . , . and . mm, in a : ratio and the absorbance at nm was measured at s intervals over a min period, using a nanodrop nd spectrophotometer. the k i values were determined using graphpad prism (www.graphpad.com). the structure of native sv cp has been determined for the first time at the near-atomic resolution of . Å resolution (fig. a ) revealing a crystallographic tetramer (fig. b) . the monomers consist of an n-terminal and a c-terminal domain with the active site cleft located in between. as found in other noroviral cl pro structures, the n-terminal domain contains an α-helix and a twisted -stranded antiparallel β-sheet forming an incomplete β-barrel (anand et al., ; birtley et al., ; mosimann et al., ) . the c-terminal domain is made up of β-strands forming an antiparallel β-barrel and contains the catalytic cysteine residue (cys ) which makes a catalytic triad with two residues from the n-terminal domain (his and glu ; fig. a ). interestingly, the β-hairpin formed by β and β , which is involved in binding the n-terminal side of the substrate peptide, adopts an appreciably different conformation from that observed in an earlier inhibitor-complexed structure ( the sv cp enzyme has approximately % sequence identity with other gi noroviral c proteases and an identity of the order of % with the enzyme from the gii genotype. sv cp has approximately % identity with the mouse norovirus enzyme. the monomer structures of these enzymes superpose with sv cp with a cα rmsd of typically . - . Å for virtually all of the amino acids in the chains. the structures differ most noticeably in the hairpin linking strands β and β which is close to the active site. in line with other noroviral c proteases which have been analysed by gel-filtration, it is highly likely that sv cp forms dimers in solution or, at least, exists in a monomer -dimer equilibrium leen et al., ; zeitler et al., ) . accordingly, a dimer is observed in the crystallographic asymmetric unit of sv cp (fig. , chains a and b) . however, analysis with the pdbepisa website (krissinel and henrick, ) suggested a tetrameric form (fig. b) might also be stable in solution. the interface area between the chains of the crystallographically observed dimers (formed by chains a and b) is . Å . however, a neighbouring dimer in the crystal structure forms an interface of comparable buried surface area ( . Å ) between chains labelled a and d chains and likewise for chains labelled the b and c. this result indicates that higher order oligomers may possibly be formed by sv cp dimers, such as the putative tetramer shown in fig. b et al., ) . given that localised replication centres are known to form within norovirus-infected cells (e.g. thorne and goodfellow, ) , a high local concentration of cl pro may allow the enzyme to tetramerise. in the native sv cp structure, no electron density is visible for the last residues (asegettl) at the c-terminal end of the protein. since these residues are well-defined in the complex with a substrate analogue (hussey et al., ) , their absence in the native structure might be due to autolysis during storage or crystallisation of the uninhibited protease. in this region of the structure, there is a minor consensus sequence for sv cp cleavage with the following amino acids vq-as corresponding to the p -p -p '-p ' positions (hussey et al., ; kankanamalage et al., ) suggesting that slow autolysis prior to crystal growth is possible. mass spectrometric analysis of the purified protein yielded a molecular mass of , daltons ( supplementary fig. ) confirming that the protease was indeed fully intact at the time of crystallisation. therefore another possibility is that this region of the molecule is simply disordered in the new crystal form. however, it is not clear why this should be since this region of both monomers is not involved in crystal contacts in either crystal form. most crystals used in the non-covalent fragment screening experiment diffracted to resolutions ranging from . to . Å with good crystallographic statistics (table ) . fragment j is the worst in terms of resolution, diffracting to approximately . Å, although the electron density is still of good quality. screening with the dspl library and part of the maybridge ro library identified ligands in total which bind in five different sites, as illustrated in fig. . the majority of fragments have mean b-factors which are comparable with those of the protein moieties (table ). in only one case (j ) was the occupancy of the fragment refined, although for several others it was set to . due to the fragments residing on a -fold axis. site a, the protease active site, is a long groove containing the catalytic cys residue. two fragments (j and j ) were found to bind here, each on different sides of the catalytic cysteine (fig. ) . five hits (j -j ) were found to bind in the putative rna binding site (site b) including one (j ) which also binds in another site, site c. site c lies in a pocket between chains a and b and the symmetry related chains a' and b', with hits being identified (j -j ) here. two other fragments were found at additional sites: d (j ) and e (j ). molecular structures of the ligands j -j are given in fig. . two non-covalently bound fragments were identified in the active site of the protease named as j and j , as indicated by their omit maps ( fig. a and c) . j binds in the s subsite where its carboxyl group is oriented towards s and s . j forms several direct hydrogen bonds with the side chains of gln and arg and makes some additional hydrogen bonds mediated by a water molecule ( fig. a and b) . these residues are at the tip of the functionally important β-hairpin (connecting strands β and β ) that is involved in substrate recognition and moves substantially upon binding of polypeptide substrate analogues (fig. ). however, in the presence of j , the β-hairpin adopts the same conformation as the ligand-free sv cp, suggesting that binding of this fragment does not alter its conformation. since the carboxyl group of j appears to hold the β-hairpin loop (residues to ) in the closed conformation, this must help to prevent the enzyme from adopting the 'open' conformation that can accommodate the substrate. the ligand -nh group (n ) is also within hydrogen bonding distance of the main chain carbonyl group of thr . the benzoic acid moiety of j forms makes many hydrophobic interactions with the active site residues including pro , cys and ala . in contrast, the -methyl- -thienyl group forms fewer contacts with the enzyme than the aromatic group since it points away from the active site towards a large solvent channel. j resides on the other side of the long active site, where it occupies the s subsite without forming any hydrogen bonds ( fig. c and d) . instead, the phenyl ring is sandwiched between the side chains of his of the catalytic triad and arg from the β-hairpin loop by  - stacking and cation - interactions. interestingly, the guanidinium group of arg has moved from its position in the other fragment complex to accommodate j . several hydrophobic interactions are formed between this fragment and glu from the catalytic triad and val , and a number of contacts are made with a symmetry-related molecule. in kinetic assays both j and j showed inhibitory activity against sv cp with k i values of . mm and . mm, respectively. these values are typical of initial hits in crystallographic fragment-screening studies targetting catalytic-or allosteric-sites of enzymes (bauman et al., ; delbert et al., ; zhang et al., ) suggesting that the binding modes we observe in cl pro are highly relevant. since j and j bind in the active site cleft and maintain the closed conformation of the hairpin, they are good candidates for developing further inhibitors and linking them into a new compound could also improve the bioactivity. a superposition of their binding modes on that of the covalently bound michael acceptor inhibitor (fig. ) demonstrates how these two fragments occupy the s and s subsites, respectively. j does not overlap with the p residue of the polypeptide inhibitor as well as j and the p residue do, since it appears to lie somewhere between the spatially adjacent s and s ' subsites. in addition to the protease activity, studies on viral c proteases suggested that they or their larger precursors can bind specifically to the '-terminal nucleotides of the viral rna (leong et al., ; nayak et al., ) . the interaction occurs only on the plus strand which forms a ribonucleoprotein (rnp) complex that is necessary for the initiation of the plus strand synthesis (andino et al., ) . it has been shown that human noroviral rna noncompetitively inhibits the protease activity with an ic of in the µm range (viswanathan et al., ) . the rna binding site has been studied by mutagenesis in other homologous c proteases, in which a key arginine residue was identified in the conserved sequence, kf/vrdi (f/v represents f or v) (bergmann et al., ; leong et al., ; nayak et al., ) . structural comparison of sv cp with hrv cl pro (pdb id: fx ; kawatkar et al., ) and fmdv cl pro (pdb id: j ; nayak et al., ) identified arg as the equivalent residue in sv cp, which is within a kirpdl sequence that has similarity with the consensus. the r and d residues in this sequence interact by a salt-bridge that forms one side of the putative rna binding site of sv cp (site b) which is shown in fig. and, as for the fmdv and hrv proteases, it is a shallow groove. in addition, these sites are in crystal contact areas and form deep channels with the neighbouring symmetry-related molecules in hrv, fmdv and sv cp cl pro . inhibitors binding in the rna binding site have the potential to inhibit noroviral replication and are therefore of interest as a separate class of drug. fragments j -j were found to reside at this site and their contact residues are shown in fig. . all the fragments form hydrophobic contacts with arg and other residues in the kirpdl sequence. while j ( fig. a and b) and j ( fig. g and h) are mainly involved in hydrophobic interactions, j ( fig. c and d) and j ( fig. e and f) also form many hydrogen bonds with the neighbouring residues, potentially making them stronger binders. the carbonyl group (o ) of j is involved in three hydrogen bonds formed, directly or mediated by a water molecule, with thr , lys and ser (although the latter residue is from a symmetry related molecule). the n atom forms two hydrogen bonds with ser and pro (also from the symmetry mate) with the participation of a water molecule. a hydrogen bond is also seen between the fluorine substituent in the indole ring of j and the ne atom on the side chain of trp . this residue is one of a number of quite solvent-exposed aromatic residues including phenylalanines , , and which form the putative rnabinding site. j also forms water-mediated hydrogen bonds with ser from the symmetry related molecule. unlike the active site fragments which bind in different subsites of the substrate-binding channel, these four fragments bind in approximately the same position with their aromatic 'heads' overlapping to a large degree but their aliphatic 'tails' pointing away in different directions. since binding of viral rna inhibits the protease activity (viswanathan et al., ) , ligands binding at this site have the potential both to interfere both with rna binding and with the protease activity. however, since this site is of the order of Å from the catalytic centre the mechanism of protease inhibition is currently difficult to explain. fragment j was found to bind in both the putative rna binding site (b, fig. i and j) and site c ( fig. a and b) in the centre of the putative tetramer. the finding that the native crystals of the enzyme are formed by a tetrameric assembly of monomers is suggestive of a physiological role for the tetramer. we were also intrigued to find that the majority of the fragments binding to the protease (j -j , fig. ) were located in a cavity at the centre of the putative tetramer, site c. the site is characterised by the convergence of two-fold symmetry axes, both crystallographic and non-crystallographic, since the ncs two-fold relating the monomers in each dimer and the crystallographic twofold relating both dimers in the tetramer meet at this point. the binding site is formed by four copies of the hydrophobic amino acids leu and val as well as arg which are provided by all chains of the tetramer. these residues have a high level of sequence conservation. the sidechain of the arginine tends to form extensive stacking interactions with the aromatic moieties of the ligand. since this site is formed at the convergence of fold axes, two copies of each ligand are present at this site and sometimes the two symmetry-related copies of the fragment interact extensively with each other. since the same tetrameric assembly is observed in other gi and gii norovirus proteases, this binding site may be a conserved feature of these enzymes. given its ability to bind so many heteroaromatic fragments and the diverse functions which noroviral proteins and their precursors are known to have (e.g. emmott et al., ) , it is tempting to speculate that the tetramer cleft has a physiological role, perhaps even as a secondary substrate-or rnabinding site. two of the fragments (j and j , fig. ) were found to bind at unrelated sites involving crystal contacts which are probably not of physiological significance. site d lies close to lys , lys and glu whereas site e lies between arg and the c-terminal end of the enzyme. the amide bond within j has apparently been cleaved and the resulting fragments, trifluroacetic acid and -ethyl- , , -thiadiazole, bind at sites c and d, respectively. interestingly, it appears that the amide bond in j has also been cleaved and the resulting -ethyl- , , -thiadiazole binds instead at site c. a check on the stock solution of this compound was made mass spectrometry and this yielded a main mass of daltons, which is within a dalton of the predicted molecular mass of the observed fragment. it is possible that the electron withdrawing groups on the amino terminal side of the amide bonds of these two compounds may render them unstable in water. the x-ray structure of the southampton virus cl pro has been determined at . Å resolution in a crystal form that has allowed fragment-screening for novel inhibitors to be undertaken at similar resolutions. two fragments were found to bind in the active site cleft of the protease. j and j bind in different subsites of the long active site (see fig. ) but both of them interact with the functionally important β-hairpin linking strands β and β . j occupies s and forms hydrophobic interactions with catalytic cys while j occupies s and forms hydrophobic and π-π interactions with glu and his , which are also from the catalytic triad. both j and j could potentially be developed into more potent norovirus protease inhibitors, however, a better ligand might ultimately be obtained by coupling them together, given that the distance between the closest two atoms is slightly less than . Å. some of the remaining fragments were found to interact with the protease at its putative rna-binding site. whilst these compounds are likely to have less effect on the protease activity than j and j , which bind in the active site, rna binding to the enzyme has been shown to cause non-competitive inhibition of the protease (viswanathan et al., ) . other fragments were found to bind at an additional site which is buried deeply in the centre of the crystallographic tetramer. the fact that a c a mutant of the minerva virus protease forms the same tetramer in the crystal with the c-terminus of one subunit occupying the active site cleft of another monomer (muzzarelli et al., ) , suggests that this assembly may also be involved in proteolytic maturation of noroviruses. hence, compounds that have the potential to interfere with formation of the tetramer or affect its stability may impact on noroviral replication and therefore deserve to be screened for in vivo activity, e.g. against mouse norvirus, which can be cultured, or in a suitable replicon assay. if such studies were to be successful, the highly symmetric nature of the binding site is something that could, in principle, be exploited in drug design. given the recent covid- pandemic, it is potentially useful to compare our results on sv cp with the cl pro of coronavirus (e.g. yang et al., ) . the two enzymes have quite low sequence identity of approximately % within the common protease moieties and superimpose with an rmsd of . Å for structurally aligned residues. the coronavirus protease is considerably larger ( residues) than sv cp due to the presence of a cterminal domain which is involved in dimerisation. although topologically similar, the protease moieties of both structures differ very substantially in the loop regions connecting the core β-strands. in spite of these differences, coronavirus protease also has specificity for gln at the p position of substrate. in very recent fragment screening of the sars-cov- protease, active site hits were obtained which span the s to s ' subsites of the enzyme, thus providing somewhat better coverage of the active site cleft than we have achieved with sv cp (douangamath et al., ) . other sars-cov- protease inhibitor structures have also been reported in recent months (dai et al., ; jin et al., a; zhang et al., ) . this resurgence of interest in rational cl pro drug design is likely to have combined benefits for what are currently intractable and severe viral infections. these studies provide a rational basis on which compounds with improved potency can be designed by medicinal chemists. table . x-ray statistics for the native sv cp structure and fragment complexes. values in parentheses are for the high resolution shell. for the minority of structures where the overall fragment occupancy was either refined or is less than unity due to proximity with a symmetry axis, the fractional occupancy is shown following the mean fragment b-factor. the β-hairpin loop connecting strands β and β moves significantly from its position in the native structure (which is very close to its position in the j and j complexes) upon binding the polypeptide inhibitor. interactions between sv cp and fragments j -j which bind in site c at the centre of the putative tetramer. these are shown in d with the omit electron density contoured at . rmsd as (a, c, e, g, i, k, m, o, q, s, u) and in d with interacting residues shown in (b, d, f, h, j, l, n, p, r, t, v) , respectively. hydrogen bonds are indicated by dashed lines in cyan and hydrophobic interactions are indicated by red eyebrow-like icons. protein chain identifiers are indicated by the letters a and b in brackets and those with a prime are from symmetry-related chains. phenix: a comprehensive python-based system for macromolecular structure solution towards automated crystallographic structure refinement with phenix.refine structure of coronavirus main proteinase reveals combination of a chymotrypsin fold with an extra α-helical domain a functional ribonucleoprotein complex forms around the ′ end of poliovirus rna reported foodborne outbreaks due to noroviruses in belgium: the link between food and patient investigations in an international context global economic burden of norovirus gastroenteritis crystallographic fragment screening and structure-based optimization yields a new class of influenza endonuclease inhibitors viral cysteine proteases are homologous to the trypsinlike family of serine proteases: structural and functional implications the refined crystal structure of the c gene product from hepatitis a virus: specific proteinase activity and rna recognition crystallographic and electrophilic fragment screening of the sars-cov- main protease polyprotein processing and intermolecular interactions within the viral replication complex spatially and temporally control norovirus protease activity coot: model-building tools for molecular graphics structure determination of murine norovirus ns proteases with c-terminal extensions designed to probe protease-substrate interactions two cases of norwalk virus enteritis following small bowel transplantation treated with oral human serum immunoglobulin a structural study of norovirus c protease specificity: binding of a designed active sitedirected peptide inhibitor edna: a framework for plugin-based applications applied to x-ray experiment online data analysis sequence and genomic organization of norwalk virus structure of mpro from sars-cov- and discovery of its inhibitors structural basis for the inhibition of sars-cov- main protease by antineoplastic drug carmofur structureguided design and optimization of dipeptidyl inhibitors of norovirus cl protease. structureactivity relationships and biochemical, x-ray crystallographic, cell-based, and in vivo studies matthews coefficient probabilities: improved estimates for unit cell contents of proteins, dna, and protein-nucleic acid complex crystals design and structure-activity relationships of novel inhibitors of human rhinovirus c protease broad-spectrum antivirals against c or c-like proteases of picornaviruses, noroviruses, and coronaviruses inference of macromolecular assemblies from crystalline state the xchemexplorer graphical workflow tool for routine or large-scale protein-ligand structure determination sequence and genome organization of a human small round-structured (norwalk-like) virus structure of a murine norovirus ns proteaseproduct complex revealed by adventitious crystallisation human rhinovirus- protease c ( cpro) binds specifically to the '-noncoding region of the viral rna. evidence that cpro has different domains for the rna binding and proteolytic activities molecular characterization of hawaii virus and other norwalk-like viruses: evidence for genetic polymorphism among human caliciviruses molecular characterization and expression of the capsid protein of a norwalk-like virus recovered from a desert shield troop with gastroenteritis child health epidemiology reference group of who and unicef global, regional, and national causes of child mortality: an updated systematic analysis for with time trends since snow mountain virus genome sequence and virus-like particle assembly norovirus vaccines under development peptidyl α-ketoamides and α-ketoheterocycles production and clinical evaluation of norwalk gi. virus lot - nv in norovirus vaccine development phaser crystallographic software norovirus regulation of refined x-ray crystallographic structure of the poliovirus c gene product refmac for the refinement of macromolecular crystal structures structural and antiviral studies of the human norovirus gii. protease a norovirus protease structure provides insights into active and substrate binding site integrity role of rna structure and rna binding activity of foot-and-mouth disease virus c protein in vpg uridylylation and virus replication norovirus antivirals: where are we now using textons to rank crystallization droplets by the likely presence of crystals a multi-crystal method for extracting obscured signal from crystallographic electron density community incidence of norovirus-associated infectious intestinal disease in england: improved estimates using viral load for norovirus diagnosis rapid covalent-probe discovery by electrophile-fragment screening foodborne illness acquired in the united states--major pathogens norovirus illness is a global problem: emergence and spread of norovirus gii. variants identification of active-site amino acid residues in the chiba virus c-like protease iterative-build omit maps: map improvement by iterative model building and refinement without model bias norovirus gene expression and replication design, synthesis, and evaluation of inhibitors of norwalk virus c protease genotypic and epidemiologic trends of norovirus outbreaks in the united states advances in laboratory methods for detection and typing of norovirus norovirus protease shows ph-sensitive proteolysis with a unique arg-his pairing in the catalytic site rna binding by human norovirus c-like proteases inhibits protease activity ligplot: a program to generate schematic diagrams of protein-ligand interactions xia : an expert system for macromolecular crystallography data reduction dimple -a pipeline for the rapid generation of difference maps from protein crystals with putatively bound ligands the crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor x-ray crystallographic structure of the norwalk virus protease at . -Å resolution crystal structure of sars-cov- main protease provides a basis for design of improved α-ketoamide inhibitors construction of a shape-diverse fragment set: design, synthesis and screen against aurora-a kinase xtriage and fest: automatic assessment of x-ray data and substructure structure factor estimation we thank profs p. m. shoolingin-jordan and i. n. clarke (university of southampton) for providing the expression construct and for numerous helpful discussions. we also thank anthony aimon (dls ltd) for fragment mass-spectrometric analysis. noroviruses responsible for % of viral foodborne illness norovirus c-like protease is excellent drug target x-ray fragment-screening a total of fragment hits were found two located at the active site and showed inhibitory activity five found at the enzyme's putative rna-binding site ten in the symmetric central cavity of the tetramer key: cord- -h ukuu authors: olotu, fisayo a.; omolabi, kehinde f.; soliman, mahmoud e.s. title: leaving no stone unturned: allosteric targeting of sars-cov- spike protein at putative druggable sites disrupts human angiotensin-converting enzyme interactions at the receptor binding domain. date: - - journal: inform med unlocked doi: . /j.imu. . sha: doc_id: cord_uid: h ukuu the systematic entry of sars-cov- into host cells, as mediated by its spike (s) protein, is highly essential for pathogenicity in humans. hence, targeting the viral entry mechanisms remains a major strategy for covid- treatment. although recent efforts have focused on the direct inhibition of s-protein receptor-binding domain (rbd) interactions with human angiotensin-converting enzyme (hace ), allosteric targeting remains an unexplored possibility. therefore, in this study, for the first time, we employed an integrative meta-analytical approach to investigate the allosteric inhibitory mechanisms of sars-cov- s-protein and its association with hace . findings revealed two druggable sites (sites and ) located at the n-terminal domain (ntd) and s regions of the protein. two high-affinity binders; zinc (fosaprepitant – site ) and zinc (lomitapide – site ) were discovered via site-directed high-throughput screening against a library of ∼ fda approved drugs. interestingly, we observed that allosteric binding of both compounds perturbed the prefusion s-protein conformations, which in turn, resulted in unprecedented hace displacement from the rbd. estimated Δg(binds) for both compounds were highly favorable due to high-affinity interactions at the target sites. in addition, site residues; r , h , k and k , i , r , i , f , l , v and w were identified for their crucial involvement in the binding and stability of zinc . likewise, energy contributions of q , n , q , l , y , q , l , v , n , and a corroborated their importance to zinc binding at the predicted site . we believe these findings would pave way for the structure-based discovery of allosteric sars-cov- s-protein inhibitors for covid- treatment. the novel coronavirus disease also referred to as covid- is caused by the sars-cov- (severe acute respiratory syndrome coronavirus ), with incidences first reported in wuhan china in december . this disease has, however, persisted till mid- , spreading across countries with over , , cases reported coupled with increasingly high casualties numbering over , globally. sars-cov- belongs to a large group of coronaviruses which are known to cause respiratory infections and related complications. these rna viruses are spherical, pleomorphic, positive-sensed, single-stranded and polyadenylated. of all known viruses, coronaviruses (covs) have the largest rna genome , with diverse pathogenic effects in animals and humans. this virus class is divided into four genera namely: alpha-cov, beta-cov, gamma-cov and delta cov [ ] [ ] [ ] , with the beta-cov class prominent for their disease-causing effects in humans (hcovs). seven hcovs have been characterized to date [ ] [ ] [ ] ; among which four (hcov-hku , hcov-oc , hcov-nl and hcov- e) cause very mild respiratory symptoms. , on the other hand, mers-cov, sars-cov, and sars-cov- cause severe respiratory and gastrointestinal infections which, in most cases, can be fatal. although sars-cov-related infections were zoonotically transmitted into human populations, , human to human transmissions has further contributed towards viral super-spread via respiratory aerosols. the entry of sars-cov- coupled with its replication process in target human cells is achieved by the functionalities of a cohort of components, majorly non-structural and structural proteins, that make up the virus. generally, about non-structural proteins (nsps) mediate diverse pro-pathogenic functions such as replication, processing and proof-reading of genomic frames, host immune evasion among many others, as previously reported. [ ] [ ] [ ] more so, covs comprises of four major structural proteins that are integral to their pathogenesis. [ ] [ ] [ ] these are the nucleocapsid (n), envelope (e), membrane (m) and spike (s) proteins. the n protein makes up the nucleocapsid and other viral genome-related processes while the m protein is the most abundant of the four, playing major roles in maintaining viral structural integrity as well as coordinating other structural proteins. e protein, on the other hand, is crucial to the maturation of the virus [ ] [ ] [ ] [ ] [ ] while the trimeric s protein mediates viral entry into the host cell via the endosomal or non-endosomal route. two domains make up the s protein namely the n-terminal s domain and the c-terminal s membrane-anchored domain. the s region is extensively conserved in covs while constituent s region residues are highly diverge across the cov strains. these domains have been further characterized into subdomains due to specific functionalities with respect to host receptor recognition and binding (s ), coupled with membrane fusion and entry (s ) (figure ). similar to sars-cov architecture, some recent reports have sub-categorized the sars-cov- s ectodomain into the n-terminal domain (ntd), a conserved receptor-binding domain (rbd) which recognizes the human angiotensin-converting enzyme (hace ), and subdomains and (sd and sd ). during infection, proteolytic cleavage or priming of the s protein is crucial for viral fusion and entry into host cells, a process mediated by host cell proteases such as the transmembrane serine protease (tmprss ) and cathepsin l, [ ] [ ] [ ] at the s /s (boundary between s and s subunits) and s ' (immediately upstream s fusion peptide -fp) cleavage sites. [ ] [ ] [ ] the s protein primarily exists in a metastable prefusion complex prior to cleavage, after which notable conformational arrangements occur in order to fuse the viral membrane into j o u r n a l p r e -p r o o f the host cell. [ ] [ ] [ ] in addition, the rbd adopts disparate conformational motions to engage the host cell receptor. , conformations. [ ] [ ] [ ] the up conformation corresponds to the hace accessible state while the down state cannot engage the host cell receptor. the s domain, on the other hand, consists of the functionally important fusion peptide (fp), which is critical for viral fusion and formation of the post-fusion complex; heptad repeats and (hr and hr ); transmembrane domain (tm) and cytoplasmic tail (ct). the hrs of the s-protein trimer interact to form a fusion core of sixhelical bundle which helps bring the membranes of the virus and host cell in close proximity for fusion and entry. therefore, the roles of sars-cov- s-protein present it as an important therapeutic target, which would enable the prevention of viral entry and fusion in host cells. numerous studies have been reported over the past months with regards to the possibility of blocking direct interactions between sars-cov- s-protein and hace . most of these studies were aimed at targeting the s protein rbd domain with antibodies, peptide-based or small molecule compounds that binds with a much higher affinity to block s-protein-hace interactions. [ ] [ ] [ ] [ ] [ ] [ ] also, targeting host proteases such as tmprss was explored in a recent study, with consequential impediments on sars-cov- entry. identification of other functional (allosteric) sites on the prefusion s protein could present another dynamic and effective approach of preventing sars-cov- infectivity relative to its interaction with the host cell ace and proteases. this alternative target approach for sars-cov- s protein is important because its rbd (similar to other covs) has been associated with a high mutational propensity which may in turn alter the affinity of small molecule inhibitors or peptide designed to bind therein. allosteric targeting was explored in a recent study wherein the cov-conserved s hr region was identified as an important target site for the development of broad-spectrum inhibitors of human covs. the resulting peptide inhibitor (ek ) was evaluated in vivo and exhibited desirable safety and efficacy . more so, the protein contact network (pcn) paradigm was used to map functional allosteric loci on sars-cov s protein. relatively, this study was implemented to (i) identify potential druggable sites across the s and s domains of the sars-cov- s protein other than the rbd-hace interface (ii) perform high-throughput (virtual) screening of ~ fda approved drugs against the most druggable site(s) (iii) investigate the binding dynamics and interaction mechanisms of the compounds and their consequential effects on the s-protein rbd-ace complex. we believe this systematic study will be able to provide structural and molecular insights into possible allosteric sites on sars-cov- s protein suitable for selective targeting and structure- computational methodologies the three-dimensional structure of sars-cov- s-protein (prefusion) was retrieved from pdb with entry vsb. this, as previously reported, represents the s-protein rbd conformation in its up (open) state, which is most suitable for hace binding. also, to model binding interactions between the prefusion sars-cov- s-protein (s /s ) and the hace , a crystalized structure with pdb entry m j was separately retrieved. this complex depicts binding between the rbd domain (truncated) of sars-cov- s-protein and the protease domain (pd) of hace . co-crystallized molecules not relevant to this study were removed while missing residues (gaps) in the structures were filled using the modeller algorithm. this preparation was performed on the ucsf chimera graphic user interface (gui). subsequently, using the structural superposition method, we were able to model a complex between prefusion s-protein (s /s ) monomer (rbd -up conformation) and the hace protein ( figure ). j o u r n a l p r e -p r o o f possible druggable sites other than the sars-cov- rbd interface were predicted using approaches previously reported. [ ] [ ] [ ] [ ] herein, we employed multiple tools for site identification and validation, which include sitemap , fpocket , discovery studio client and prankweb. sitemap is an exhaustive tool which ranks protein pockets based on properties such as druggability, surface exposure, hydrophobicity and hydrophilicity among others [ ] [ ] [ ] . these details were then used to characterize the predicted pockets after which other predictive algorithms were used complementarily for cross-validation. two highly ranked sites were then selected for further analyses. furthering on the rationale of the study, we mapped out the two most druggable sites on the target protein and virtually screened against them a large chemical library of fda approved drugs (~ compounds) derived from the zinc repository (http://zinc.docking.org/substances/subsets/fda/). this screening was performed using highperformance computing-integrated autodock vina prior to which coordinates of the predicted sites were mapped using gridboxes. corresponding binding scores were retrieved from the resulting .pdbqt files and were used to filter down to the topmost compounds for each predicted sites and . subsequently, two compounds with the highest binding scores (most negative) were selected for the two predicted sites yielding complexes that were subjected to further simulation studies. as explained in . , the prefusion s-proteins (ligand-bound and unbound) were superimposed with the rbd-hace complex ( m j) after which the single j o u r n a l p r e -p r o o f truncated rbd was removed. by so doing, we obtained models of allosterically-bound and unbound pre-fusion s-protein-ace complex. this, as aimed in this study, would provide structural and dynamical insights into the mechanistic effects of allosteric targeting on sars-cov- host entry machinery. although computationally expensive ( residues), we proceeded with long-timescale md simulation runs for the systems on amber graphical processing unit (gpu) using its embedded modules. protein parameters were defined using the ff sb forcefield while ligand parameters were generated with the antechamber and parmchk modules. likewise, the leap program was used to define coordinate and topology files for the ligand-bound and unbound protein complexes. this program, also, was used to neutralize (addition of counter-ions; na + and cl -) and solvate the systems in a tip p water box of size Å. structural minimization was first carried out partially for steps with a restraint potential of kcal mol - . Å followed by another steps of full minimization with no restraints. a canonical (nvt) ensemble with a kcal mol - Å harmonic restraints was used to heat the systems gradually from - k for ps, after which the systems were equilibrated for ps at a constant k temperature without restraints in an npt ensemble. atmospheric pressure was maintained at bar with a berendsen barostat while each protein system was subjected to a production run of ns. studied systems include zinc -s-protein-hace (allosteric site ), zinc -sprotein-hace (allosteric site ), and unbound s-protein-hace . corresponding trajectories were saved at every ps time-frame until the end of the simulation followed by data plot analyses using microcal origin software. snapshots were also taken and analyzed to monitor structural events and ligand interaction dynamics across the trajectories on the ucsf chimera user j o u r n a l p r e -p r o o f interface (gui) and discovery studio client. the molecular mechanics/generalized born surface area (mm/gbsa) method was used to evaluate binding affinities of the predicted allosteric s-protein binders at their target sites. binding energy profiles for both compounds, inclusive of their energy components, were estimated using snapshots from the terminal ns of md trajectories where conformational stabilities were visible. this approach was important in order to minimize the effects of conformational disorder or entropy on ligand interactions. the equations below mathematically express binding energy calculations: as shown, internal (∆e int ), electrostatic (∆e ele ) and van der waals (∆e vdw ) energies sum up the gas-phase energy (∆g gas ) while the solvation free energy (∆g sol ) is defined by the polar solvation (∆g ele,sol ) and non-polar contribution to solvation (∆g np,sol ) terms. the mm/gbsa method was used to estimate the generalized born (gb) for ∆g ele,sol while the linear relationship between the surface tension proportionality constant (γ = . mol - Å - ), solvent accessible surface area (sasa, Å ), and β constant was used to solve ∆g np,sol . furthermore, estimated ∆g bind was decomposed into individual residue energies, most especially those that constitute the predicted allosteric pockets where the ligands were bound. this method was essential to identify specific residues that contribute crucially to the stability and inhibitory activities of potential allosteric inhibitors. j o u r n a l p r e -p r o o f based on the study rationale, we set out to identify possible sites for drugging the target protein table ). the architectures of these pockets are shown in figure . furthermore, defining the druggability of a site on target proteins depends on the size (volume) and hydrophobicity (with minimal hydrophilicity) while, on the other hand, high hydrophilicity, reduced hydrophobicity, small pocket size and shallowness characterize "difficult-to-drug" and undruggable pockets , [ ] [ ] [ ] . while large hydrophilicity could have repulsive effects on ligand mobility at the binding site, a small or shallow cavity would impede ligand access, fitness, optimal binding and stability. j o u r n a l p r e -p r o o f from table , sites → ranks above the . halgren dscore threshold making them suitable for therapeutic targeting. relatively, site appears to be highly surface-exposed with a score of . while a large pocket size and volume for site could favor the use of large-molecule compounds. taken together, high surface-exposure coupled with relatively large volumes, hydrophobicity and favorable donor/acceptor properties for sites and could account for their suitability as targetable allosteric regions on the s-protein other than the rbd (figure ). these presumptions are also reflected by the estimated dscore and sitescore values. in addition, since these predicted sites are highly functional, particularly the overlapping fp, hr and cr, targeting them could high-throughput screening and identification of potential allosteric binders to the predicted sites and high-throughput screening using a library of ~ fda approved drug compounds (http://zinc.docking.org/substances/subsets/fda/) were performed against the two predicted allosteric sites. results for the top compounds with the highest binding scores are presented in supplementary table s and supplementary table s for sites and respectively. from the screening results, overall highest scores were estimated for zinc (- j o u r n a l p r e -p r o o f kcal/mol) at site and zinc (- . kcal/mol) at site . as highlighted in our methods, md simulations were performed for the prefusion s-protein-hace complexes bound distinctly at two potential allosteric sites. this approach was essential to investigate the likely effects of allosteric targeting on the entry/fusion mechanisms of sars-cov- via host hace . however, this conformation appeared distorted the allosterically-bound s-proteins and could account for displacement motions of the interacting hace from the rbd interface. therefore, the allosteric-mediated disruption of sars-cov- s-protein rbd and its interaction with hace , as reported herein, is a major finding that could indicate the viability of allosteric targeting in sars-cov- therapy. furthermore, we measured structural stabilities across the ligand-protein complexes relative to the unbound system using the rmsd metrics. as shown in figure , structural instability was highest in the unbound s-protein while its associated hace was relatively stable compared to this could indicate the structural effects of allosteric targeting on the s-protein and its interaction with hace . estimated mean rmsds, as presented in table , corroborates conformational variations among the unbound and bound protein complexes. to minimize the effects of structural disorderliness (entropy) in our calculations, we selected, from the md trajectories, terminal time-frames ( - ns) from which the systems appeared to relatively stabilize. these were defined as the finally equilibrated (fe) time-frames and were used for subsequent structural analyses ( table ). from the resulting fe-rmsd plots, unbound s-protein was highly unstable while its associated hace exhibited low structural motion in line with the rmsd calculations, which could also imply that the binding of s-protein stabilized hace . in contrast, the allosterically-bound sproteins (sites and ) were notably stable while their corresponding hace showed high structural instability that could correlate with their systemic motions at the s-protein rbd as earlier mentioned. structural analyses of ligand orientations at the respective allosteric sites of sars-cov- sprotein were performed using averaged structures from the md trajectories ( figure ). findings reveal that the allosteric binding of zinc (fosaprepitant) was stabilized at the ntd. fosaprepitant contains a terminal triphosphate group that orients towards residues such as n , k , n , r and h . likewise, its trifluoromethyl group oriented towards d while constituent -o and -nh groups mediate interactions with q and n , among others. these altogether could facilitate high-affinity interactions accountable for its stability and allosteric inhibitory effects against the sars-cov- and associated hace . j o u r n a l p r e -p r o o f binding affinities of the compounds were determined using the mm/pbsa technique, which also allowed us to measure the energy contributions of interactive residues at the predicted allosteric sites. energy calculations, as presented in table were performed using relatively stable time-frames ( - ns) to minimize entropical effects that may interfere with ligand binding activities. in addition, we observed that electrostatic effects contributed most notably to the allosteric binding of zinc at the ntd region while van der waals contributions had the highest effect on the binding of zinc at the predicted site pocket. electrostatic contributions at site could be due to the high number of electropositive residues that constitute the pocket, as shown in figure , which may form high-affinity interactions with electronegative moieties of the compound. calculations further revealed that ∆e vdw and ∆e ele were more favorable in the gas phase for zinc while polar solvation energies were more favorable for zinc j o u r n a l p r e -p r o o f at the s region of the s-protein. this could imply that while the former was buried in the deep hydrophobic pocket of the ntd, the latter was surface exposed due to its trans-domain binding activity as earlier reported. to understand the mechanistic binding of the compounds at both predicted sites, we decomposed the binding free energies into individual contributions of the interacting residues. these were juxtaposed with structural analysis that showed the type and (π-alkyl) interactions. more so, π-π stacked interaction between y and a benzene ring (of the -tri-fluoromethyl- , '-biphenyl group) could be highly crucial for the stability of the compound. taken together, electrostatic energies favored the binding of zinc at site while vdw energies favored zinc binding at site , which consequentially, were able to perturb the s-protein rbd and allosterically disrupt hace interactions. the systemic entry of sars-cov- into the human host cell is a crucial process that underlies its virulence and pathogenicity in humans and other animals it infects. this mechanism is mediated by its interaction with the host ace (hace ) via attachment and fusion. potential intervention approaches in sars-cov- treatment include therapeutic strategies that could prevent sars-cov- s-protein binding to hace . in this study, we implemented an exhaustive approach to identify drug molecules that could potentially bind to sars-cov- s-protein at other sites other than the rbd. pertinent to the allosteric targeting approach implemented herein j o u r n a l p r e -p r o o f was the identification of highly druggable sites inherent in the s-protein (s /s ), which was carried out using multiple pocket prediction algorithms for identification and validation of possible allosteric sites. predicted pockets were then characterized based on their attributes after which two highly probable pockets were selected. these were then screened distinctly against a library of ~ fda approved drugs retrieved from the zinc database. amongst all, thermophoresis (mst) can be employed for further validation. these implementations will provide additional insights into the targetability and suitability of these pockets for novel covid- therapeutics. findings from this study paves way for novelty in the structure-based design of high-affinity allosteric inhibitors or disruptors of sars-cov- association with host hace thereby preventing viral entry. authors thank the college of health sciences, university of kwazulu-natal, south africa for providing infrastructural support and we also acknowledge the center for high performance computing (chpc), capetown, south africa, for providing computational resources. authors declare no conflict of interest. this research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. ) a novel coronavirus from patients with pneumonia in china ) coronavirus cases coronavirus replication and pathogenesis: implications for the recent outbreak of severe acute respiratory syndrome (sars), and the challenge for vaccine development february) a molecular arms race between host innate antiviral response and emerging human coronaviruses june) epidemiology, genetic recombination, and pathogenesis of coronaviruses genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan origin and evolution of pathogenic coronaviruses studies with human coronaviruses ii. some properties of strains e and oc clinical and molecular epidemiological features of coronavirus hku -associated community-acquired pneumonia zoonotic origins of human coronaviruses october) interspecies transmission and emergence of novel viruses: lessons from bats and birds a pneumonia outbreak associated with a new coronavirus of probable bat origin epidemiology, causes, clinical manifestation and diagnosis, prevention and control of coronavirus disease (covid- ) during the early outbreak period: a scoping review severe acute respiratory syndrome coronavirus (sars-cov- ): an overview of viral structure and host response coronaviruses: an overview of their replication and pathogenesis april) emerging coronaviruses: genome structure, replication, and pathogenesis the molecular biology of coronaviruses efficient assembly and release of sars coronavirus-like particles by a heterologous expression system mers-cov virus-like particles produced in insect cells induce specific humoural and cellular imminity in rhesus macaques molecular interactions in the assembly of coronaviruses a structural analysis of m protein in coronavirus assembly and morphology a severe acute respiratory syndrome coronavirus that lacks the e gene is attenuated in vitro and in vivo the small envelope protein e is not essential for absence of e protein arrests transmissible gastroenteritis coronavirus maturation in the secretory pathway generation of a replication-competent, propagation-deficient virus vector based on the transmissible gastroenteritis coronavirus genome coronavirus envelope (e) protein remains at the site of assembly coronaviruses-drug discovery and therapeutic options mechanisms of coronavirus cell entry mediated by the viral spike protein sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor cathepsin l functionally cleaves the severe acute respiratory syndrome coronavirus class i fusion protein upstream of rather than adjacent to the fusion peptide role of the spike glycoprotein of human middle east respiratory syndrome coronavirus (mers-cov) in virus entry and syncytia formation inhibitors of cathepsin l prevent severe acute respiratory syndrome coronavirus entry efficient activation of the severe acute respiratory syndrome coronavirus spike protein by the transmembrane protease tmprss protease is linked to the severe acute respiratory syndrome coronavirus receptor and activates virus entry human coronaviruses: a review of virus-host interactions structure, function, and evolution of coronavirus spike proteins function, and antigenicity of the sars-cov- spike glycoprotein tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion the spike glycoprotein of the new coronavirus -ncov contains a furin-like cleavage site absent in cov of the same clade cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains structural basis of receptor recognition by sars-cov- structural and functional basis of sars-cov- entry by using human ace . cell in silico study of the spike protein from sars-cov- interaction with ace : similarity with sars-cov, hot-spot analysis and effect of the receptor polymorphism li , , *, cheng zhao , , zhaohui li a noncompeting pair of human neutralizing antibodies block covid- virus binding to its receptor ace computational design of ace -based peptide inhibitors of sars-cov- august) bat-to-human: spike features determining "host jump" of coronaviruses sars-cov, mers-cov, and beyond a pan-coronavirus fusion inhibitor targeting the hr domain of human coronavirus spike mapping active allosteric loci sars-cov spike proteins by means of protein contact networks structure of the sars-cov- spike receptor-binding domain bound to the comparative protein structure modeling using modeller ucsf chimera, modeller, and imp: an integrated modeling system exploring the lapse in druggability: sequence analysis, structural dynamics and binding site characterization of k-rasg c variant, a feasible oncotherapeutics target potential ebola drug targets -filling the gap: a critical step forward towards the design and discovery of potential drugs possible allosteric binding site on gyrase b, a key target for novel anti-tb drugs: homology modelling and binding site identification using molecular dynamics simulation and binding free energy calculations can we rely on computational predictions to correctly identify ligand binding sites on novel protein drug targets? assessment of binding site prediction methods and a protocol for validation of predicted binding sites identifying and characterizing binding sites and assessing druggability fpocket: an open source platform for ligand pocket detection prankweb: a web server for ligand binding site prediction and visualization new method for fast and accurate binding-site identification and analysis therapeutic target-site variability in α -antitrypsin characterized at high j o u r n a l p r e -p r o o f resolution silico assessment of potential druggable pockets on the surface of ?? -antitrypsin conformers autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading amber molecular dynamics with coupling to an external bath originpro . : scientific data analysis and graphing software-software review structural basis for the recognition of sars-cov- by full-length human ace mapping allosteric communications within individual proteins the following information is required for submission. please note that failure to respond to these questions/statements will mean your submission will be returned. if you have nothing to declare in any of these categories then this should be stated. all sources of funding should be declared as an acknowledgement at the end of the text. authors should declare the role of study sponsors, if any, in the collection, analysis and interpretation of data; in the writing of the manuscript; and in the decision to submit the manuscript for publication. if the study sponsors had no such involvement, the authors should so state. studies on patients or volunteers require ethics committee approval and fully informed written consent which should be documented in the paper.authors must obtain written and signed consent to publish the case report from the patient (or, where applicable, the patient's guardian or next of kin) prior to submission. we ask authors to confirm as part of the submission process that such consent has been obtained, and the manuscript must include a statement to this effect in a consent section at the end of the manuscript, as follows: "written informed consent was obtained from the patient for publication of this case report and accompanying images. a copy of the written consent is available for review by the editor-in-chief of this journal on request".patients have a right to privacy. patients' and volunteers' names, initials, or hospital numbers should not be used. images of patients or volunteers should not be used unless the information is essential for scientific purposes and explicit permission has been given as part of the consent. if such consent is made subject to any conditions, the editor in chief must be made aware of all such conditions. even where consent has been given, identifying details should be omitted if they are not essential. if identifying characteristics are altered to protect anonymity, such as in genetic pedigrees, authors should provide assurance that alterations do not distort scientific meaning and editors should so note. please specify the contribution of each author to the paper, e.g. study design, data collections, data analysis, writing, others, who have contributed in other ways should be listed as contributors.this research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.not applicable to this study.fao conceptualized, implemented, analyzed, interpreted and wrote the manuscript, kfo performed molecular dynamics simulation, while mes revised and approved the manuscript for submission. ☒ the authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.☐the authors declare the following financial interests/personal relationships which may be considered as potential competing interests:j o u r n a l p r e -p r o o f key: cord- - gsbtfag authors: kiemer, lars; lund, ole; brunak, søren; blom, nikolaj title: coronavirus cl(pro )proteinase cleavage sites: possible relevance to sars virus pathology date: - - journal: bmc bioinformatics doi: . / - - - sha: doc_id: cord_uid: gsbtfag background: despite the passing of more than a year since the first outbreak of severe acute respiratory syndrome (sars), efficient counter-measures are still few and many believe that reappearance of sars, or a similar disease caused by a coronavirus, is not unlikely. for other virus families like the picornaviruses it is known that pathology is related to proteolytic cleavage of host proteins by viral proteinases. furthermore, several studies indicate that virus proliferation can be arrested using specific proteinase inhibitors supporting the belief that proteinases are indeed important during infection. prompted by this, we set out to analyse and predict cleavage by the coronavirus main proteinase using computational methods. results: we retrieved sequence data on seven fully sequenced coronaviruses and identified the main cl proteinase cleavage sites in polyproteins using alignments. a neural network was trained to recognise the cleavage sites in the genomes obtaining a sensitivity of . % and a specificity of . %. several proteins known to be cleaved by other viruses were submitted to prediction as well as proteins suspected relevant in coronavirus pathology. cleavage sites were predicted in proteins such as the cystic fibrosis transmembrane conductance regulator (cftr), transcription factors creb-rp and oct- , and components of the ubiquitin pathway. conclusions: our prediction method netcorona predicts coronavirus cleavage sites with high specificity and several potential cleavage candidates were identified which might be important to elucidate coronavirus pathology. furthermore, the method might assist in design of proteinase inhibitors for treatment of sars and possible future diseases caused by coronaviruses. it is made available for public use at our website: . in the spring of , the severe acute respiratory syndrome (sars) caused numerous fatalities particularly in southeast asia and gravely affected the global economy. the causative agent was shown to be a human coronavirus [ ] , a virus type which normally causes mild cold symptoms in humans. the abrupt appearance raises concern of another break-out of an epidemic of sars virus or similar strains in the future. coronaviruses are found in different species ranging from chicken to cattle and humans. currently, seven coronavirus genomes, including sars coronavirus (cov), have been fully sequenced and cluster into four main groups, of which sars-cov occupies its own [ , ] . polyproteins encoded by the coronavirus rna are processed by viral proteinases yielding mature proteins. the main proteinase cl pro performs at least eleven proteolytic cleavages within a single viral polyprotein [ , ] . viral polyprotein processing is a common theme in viral molecular biology, e.g. as seen in picornaviruses and retroviruses like hiv. therefore, essential viral proteinases have been suggested as potential targets for specific therapeutic approaches, e.g. by development of specific proteinase inhibitors [ ] [ ] [ ] . in the case of picornaviruses, virus-encoded proteinases are able to cleave specific cellular targets and thereby severely inhibit the cellular translational machinery (the "host cell shut-off" response) while still allowing for high translational activity of viral mrna [ ] . earlier, we developed a computational approach for predicting potential cleavage sites of picornavirus proteinases a and c [ ] . badorff et al. successfully used this cleavage predictor to identify the cellular target dystrophin, which they experimentally showed to be cleaved both in vitro and in vivo [ ] . however, preliminary studies revealed that this model is not compatible with coronavirus cleavage sites. the general approach is still valid though, and we decided to apply this method to the problem of predicting the cl pro proteinase cleavage sites and identifying potential host cell target proteins. we propose that a deeper understanding of coronavirus proteinase function and substrate specificity may benefit further research by: i) increasing the understanding of substrate specificity determinants which may direct studies focusing on the development of specific proteinase inhibitors and ii) providing a method for screening cellular target proteins for potential coronavirus proteinase cleavage sites. in this paper, we describe the development of a computational prediction method using artificial neural networks for predicting coronavirus cl pro proteinase cleavage sites. the method is based on known cleavage sites in seven members of the coronavirus family as the cleavage sites are believed to be sufficiently conserved among family members. this notion is supported by the fact that the sars cl pro proteinase has recently been shown capable of catalysing the cleavage of peptide fragments from other coronaviruses at the expected cleavage sites [ ] . we discuss potential targets of cl pro proteinase, e.g. the cystic fibrosis transmembrane conductance regulator (cftr) and translational and transcriptional factors, which may be involved in the molecular pathology of coronaviruses in general and sars virus in particular. the annotated coronavirus polyprotein main proteinase cleavage sites were aligned without gaps by constraining the p position. every site had a glutamine (q) in position p (the position just before the cleavage site; the positions are named as suggested by berger and schechter [ ] with p , p , ... etc., n-terminal to the cleavage site and p ', p ', ... etc., c-terminal to the cleavage site). from the sequence logo ( figure ) a very strong consensus is evident around the cleavage site. as discussed by others [ , ] , the coronavirus c-like proteinase shares many traits with its picornavirus c proteinase counterpart, hence the name. this is reflected in the cleavage site logo although differences between the two are also apparent. positions p ', p , and p have similar amino acid distribution in the c and cl proteinase cleavage sites. on the other hand, the coronavirus proteinase has a strong preference for leucine at position p while this position is relatively non-conserved among picornavirus proteinase cleavage sites [ ] . a recently published study of the crystal structure of cl pro from the e strain of human coronaviruses indicates that residues at positions p to p form an anti-parallel β sheet with part of the proteinase, signifying their importance in cleavage site recognition [ ] . it is clear from the above that a simple, position specific consensus sequence is difficult to define. with the present data set from seven different coronaviruses it is possible to classify correctly ( %) of the cleavage sites by matching an 'lq' consensus pattern. however, an additional sites in the viral polyproteins are incorrectly classified as cleavage sites, being random occurrences of this pair of amino acids. classification is improved by using the consensus pattern 'lq [s/a]', meaning leu-gln-(ser or ala), but it is still far from being a useful classifier. the false positive rate is now down to wrong sites, but at the same time only ( %) of the correct cleavage sites are detected. as the pattern becomes more sophisticated, specificity increases (reducing the number of false positives) but at the same time sensitivity drops dramatically (i.e. fewer of the true sites are detected). to overcome the limitations of simple consensus patterns, we trained an artificial neural network to identify the cleavage sites. the best model was obtained using a threelayered neural network with two hidden neurons and a sequence window encompassing nine amino acids centered on the p position, thus encompassing p -p '. the network evaluates and assigns a score between and to every glutamine to which it is presented, where a score above . is considered a positive answer (i.e. a cleavage site is predicted). this model was able to classify correctly of known cleavage sites ( . %) and , of , ( . %) sites assumed not to be cleaved by the proteinase when testing on independent sites not included when training. the neural network method could thus identify many more of the positive sites with fewer false positives than simple consensus-type methods thereby increasing the classification performance. to evaluate the predictive power of the neural network, we performed a basic bayesian analysis of the data set test results. the scoring range from to was divided into ten bins and the posterior probability of a positive prediction (a prediction indicating a cleavage) being true was calculated and plotted ( figure ). the posterior probability in the range . to . cannot be determined accurately since relatively few examples score in this interval -only % of the test set (both positive and negative examples) scores between . and . . however, results indicate that prediction scores can be classified into three categories, those that fall below . are most likely not cleaved, those that fall between . and . are possibly cleaved and those above . are most likely cleaved if available to the proteinase. as mentioned above, there are several experimentally verified examples of host cell protein cleavage by virus proteinases. thus, both these and other non-coronavirus proteins from swiss-prot [ ] . were examined for potential cleavage sites. in total three groups of proteins were examined: i) proteins known to be cleaved by other viruses, ii) proteins which could be targets when considering the pathology of coronaviruses iii) proteins related to the expected immune response to a viral infection. eukaryotic translation initiation factor gamma (if g_human) has a potential cleavage site after gln ( . ), but also at two other positions although with lower cleavage scores. cleavage of this protein may lead to host cell shut-off in a similar way to what has been described for picornavirus a proteinase [ ] . two subunits of the rna polymerase iii are predicted targets of the coronavirus proteinase cl pro . rna polymerase (rpc _human) has a predicted cleavage site after glnl with a score ( . ) well above the . cut-off. the protein is the second largest subunit of the rna polymerase iii complex and if this protein is indeed a cellular proteinase target it might cause disruption of the rna polymerase iii complex upon infection with a coronavirus. a similar disruption would be expected in case of a cleavage of the largest subunit of the complex (rpa _human) which also has a predicted cleavage site (at position , score . ). it agrees with findings that poliovirus disrupts rna polymerase iii function, although this occurs through cleavage of transcription factor iiic and not the polymerase subunits themselves [ ] [ ] [ ] . several well-known transcription factors contain potential cleavage sites. the highest scoring is creb-rp (at b_human) with a predicted cleavage site at gln ( . ) close to the dna binding leucine zipper motif. this is in agreement with findings from picornavirus c pro proteinase although at a different position in the sequence [ ] . oct- (po _human) is also predicted to be cleaved by the cl pro proteinase with high confidence ( . ) following gln again corresponding to experimental evidence from picornavirus [ ] . several subunits of the transcription initiation factor tfiid, which is a verified target in poliovirus infections [ ] , have predicted cleavage sites; the kda subunit (t d _human), the kda subunit (t d _human), and the kda subunit (t dt_human). the tumor-suppressor protein p is known to be cleaved by picornavirus c pro proteinase [ ] but this protein is not predicted to contain any coronavirus c pro proteinase cleavage sites. however, p -binding protein (p _human) and p -binding protein (p _human), which stimulate p -mediated transcriptional activation [ ] , have several potential cleavage sites. another known target for viral infections is the microtubule-associated protein (map- ) which is cleavable in hela cells by the poliovirus c pro proteinase [ , ] . map- (map _human) might also be cleavable by cl pro albeit with a low score (after gln with a score of . ) and furthermore microtubule-associated protein rp/eb member and (mae _human and mae _human) have sites which obtain scores above . . the position of the possible cleavage site in map- is different from that observed with poliovirus c pro reflecting the different specificity of this proteinase. lung related proteins were examined as early symptoms of sars could indicate a relation. the cystic fibrosis transmembrane conductance regulator (cftr_human) is an atp-dependent chloride channel. it has a predicted cleavage site with a high score ( . ) following gln in the human sequence. this part of the membrane protein is cytoplasmic and contains several phosphorylation sites (residues - ) indicating an accessible region. the epithelial sodium channels play an important role in lung liquid homeostasis [ ] and the amiloride-sensitive sodium channel δ-subunit (scad_human) has a predicted cleavage site in the cytoplasmic c-terminus (after pattern % more of these proteins may lead to reduced presentation of viral peptides to cytotoxic t lymphocytes thereby inhibiting the cellular immune response. irak- (ira _human) which is involved in il- induced activation of cells has a predicted cleavage site after gln scoring . . interferon-induced protein - precursor (ini _human) is a membrane protein and was predicted to possess a cleavage site following gln ( . ) which is located in the cytoplasmic part of the mature protein. protein - has been shown to enhance interferonα antiviral efficacy [ ] . interferon-α, -β, and -γ are known to be involved in antiviral defence and have been employed for treatment of sars [ ] , but the interferons themselves do not seem to possess cleavage sites. we have listed the human proteins analysed in this work in a table (table ). we have developed a neural network capable of identifying the cleavage site of the coronavirus proteinase cl pro and use this model to predict potential cleavage sites in host cell proteins. the predictor is highly specific which means that few false positives are expected, in fact on independent test sets we observed a false positive rate around %. the optimal network window size of nine residues agrees well with available structural information about the proteinase from human coronavirus e which indicates that the active site makes contact with at least four residues n-terminal to the glutamine [ ] . the ten sites known to be cleaved but failed to be recognised by the neural network are not dramatically different from the remainder of the sites ( table ) . we therefore do not suspect these to be sites of a different hitherto unknown proteinase, but it would be interesting to see if the lower prediction score reflects a lower cleavage efficiency in vivo. of the fourteen negative examples wrongly predicted as cleavable (table ) scoring range posterior probability / fraction some resemblance to real cleavage sites but also some resemblance to negative examples which are not predicted as cleavable. they may represent sites in-between which are cleavable to a certain extent but are shielded from cleavage due to conformational issues. predicted sites even with high scores which are inaccessible to the proteinase (like extracellular domains, transmembrane domains, or buried domains in globular proteins) should be disregarded, as accessibility information is not available to the neural network. cleavage sites probably exist that are not cleaved because they are not exposed to the solvent sufficiently for the proteinase to work. others have attempted recognising the cleavage sites of the cl proteinase as a component of a coronavirus gene prediction server using different methods [ ] . as the goal table : selected potential cleavage sites in human proteins from the swiss-prot database examined in this work. columns represent swiss-prot identifier, predicted cleavage site position of p in the target protein, cleavage site score, and cellular localisation of target protein (cyt -cytoplasmic, nuc -nuclear, mem -membrane associated). the last column lists the cleavage site in the sequencecleavage is predicted between the central glutamine residue (q) and the following amino acid residue. sorted by prediction score. nc_ aibv kssvqsvag nc_ mhv vsqiqsrlt nc_ mhv nprlqcttn nc_ tgv kiglqakpe nc_ bcov etrvqcstn nc_ pedv gvnlqggyv nc_ pedv snnlqglen nc_ sars gvtfqgkfk nc_ sars eplmqsada nc_ sars vatlqaenv was different, that predictor is not publicly available and no performance values have been published. our method can be employed by researchers suspecting a possible viral proteinase cleavage but may also prove useful for researchers working with coronavirus function. finally, the method might facilitate proteinase blocking based drug discovery by providing hints about proteinase affinity to various non-cleavable peptide ligands, which is a possible strategy for drug development [ , ] . seven full-length coronavirus genomes were retrieved from the genbank database [ ] with the following acces- amino acid conservation in multiple sequence alignments may be visualised using sequence logos. the height of the amino acid one-letter abbreviations reflect the shannon information content [ ] in units of bits at that specific position in the multiple sequence alignment [ ] . the basic idea behind the visualisation technique is that the height of each letter in a given position reflects its probability p k (i). the total height of the column reflects the total information content (d(i)) at that specific position in the alignment given by (for proteins): very conserved positions will then get tall columns with the height of individual residue symbols reflecting the amino acid distribution. the artificial neural networks used in this work were of the standard feed-forward type. sparse encoding was used for translating the amino acids to data input for the networks as has been described previously [ , , ] . training was done with three-fold cross-validation and matthews correlation coefficients [ ] were calculated by summing up true positives, false positives, true negatives, and false negatives in all combinations of training and test sets. using an architecture with two hidden neurons and a symmetric window of nine amino acids centered on the glutamine in the p position it was possible to obtain a correlation coefficient of . on cross-validated test sets. care was taken to ensure that all cleavage sites were equally distributed in every cross-validated set. the validity of the statistics depends on the expected fraction of cleavage sites in a given data set, which we only know in the data set at hand. statistics was thus done on the data set test results in order to create a histogram of prediction probabilities. statistics was done using bayes' theorem: the prediction outcome ( - ) was divided into bins (x l ) with increments of . . the posterior probability p(c pos |x l ) gives the probability of a positive prediction (that is, a cleavage) being true given the bin. this can be calculated from the prior probability p(c pos ), which is the fraction of positive examples in the data set, and the classconditional probability p(x l |c pos ) for positive examples, which is the fraction of positive examples in the bin x l . p(x l ) is the fraction of prediction outcomes in bin x l . an averaged sum of the score of all three networks arising from the three-fold cross-validation was used for prediction. each network outputs a score in the range [ . - . ], where scores below . indicate non-cleavage and scores above . indicate potential cleavage. this method is also employed by the prediction web server mentioned below. the swiss-prot database [ ] release . (february ) was downloaded and proteins from this database were used as targets for the neural network predictions. our neural network based prediction method, netcorona, for prediction of potential cleavage sites of the sars- cl pro proteinase is publicly available by following the link 'cbs prediction servers' from http://www.cbs.dtu.dk or at this specific url: http://www.cbs.dtu.dk/services/ netcorona/ aetiology: koch's postulates fulfilled for sars virus characterization of a novel coronavirus associated with severe acute respiratory syndrome identification of polypeptides encoded in open reading frame b of the putative polymerase gene of the murine coronavirus mouse hepatitis virus a characterization of a human coronavirus (strain e) c-like proteinase activity inhibition of coronavirus mhv-a replication by proteinase inhibitors coronavirus main proteinase ( clpro) structure: basis for design of anti-sars drugs the sars coronavirus: a postgenomic era degradation of cellular proteins during poliovirus infection: studies by two-dimensional gel electrophoresis cleavage site analysis in picornaviral polyproteins: discovering cellular targets by neural networks enteroviral protease a directly cleaves dystrophin and is inhibited by a dystrophin-based substrate analogue mechanisms and enzymes involved in sars coronavirus genome expression mapping the active site of papain with the aid of peptide substrates and inhibitors virus-encoded proteinases and proteolytic processing in the nidovirales conservation of substrate specificities among coronavirus main proteases the swiss-prot protein knowledgebase and its supplement trembl in mapping of functional domains in eukaryotic protein synthesis initiation factor g (eif g) with picornaviral proteases. implications for cap-dependent and cap-independent translational initiation a transcriptionally active form of tfiiic is modified in poliovirus-infected hela cells poliovirus proteinase c converts an active form of transcription factor iiic to an inactive form: a mechanism for inhibition of host cell polymerase iii transcription by poliovirus dna binding domain and subunit interactions of transcription factor iiic revealed by dissection with poliovirus c protease inhibition of host cell transcription by poliovirus: cleavage of transcription factor creb by poliovirus-encoded protease cpro cleavage of transcriptional activator oct- by poliovirus encoded protease cpro an rna polymerase ii transcription factor inactivated in poliovirus-infected cells copurifies with transcription factor tfiid poliovirus c protease-mediated degradation of transcriptional activator p requires a cellular activity stimulation of p -mediated transcriptional activation by the p -binding proteins, bp and bp poliovirus infection results in structural alteration of a microtubule-associated protein poliovirus protease c mediates cleavage of microtubule-associated protein alveolar epithelial type i cells contain transport proteins and transport sodium, supporting an active role for type i cells in regulation of lung liquid homeostasis gene expression associated with interferon alfa antiviral activity in an hcv replicon cell line treatment of sars with human interferons prediction of proteinase cleavage sites in polyproteins of coronaviruses and its applications in analyzing sars-cov genomes binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against sars genbank: update a mathematical theory of communication. bell system tech j sequence logos: a new way to display consensus sequences identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites sequence and structure-based prediction of eukaryotic protein phosphorylation sites comparison of the predicted and observed secondary structure of t phage lysozyme this work was supported by grants from the danish national research foundation, the danish natural science research council, and neurosearch a/s (to lk). lk carried out sequence retrieval, alignment, neural network training, prediction on potential proteins and drafted the manuscript. ol provided input on virus pathology and suggested human proteins for prediction. sb provided general inputs and improvements to the manuscript. finally, nb conceived of and supervised the study in addition to assisting with the drafting of the manuscript. key: cord- -sls bsm authors: dean, natalie e.; pastore y piontti, ana; madewell, zachary j.; cummings, derek a.t; hitchings, matthew d.t.; joshi, keya; kahn, rebecca; vespignani, alessandro; elizabeth halloran, m.; longini, ira m. title: ensemble forecast modeling for the design of covid- vaccine efficacy trials date: - - journal: vaccine doi: . /j.vaccine. . . sha: doc_id: cord_uid: sls bsm to rapidly evaluate the safety and efficacy of covid- vaccine candidates, prioritizing vaccine trial sites in areas with high expected disease incidence can speed endpoint accrual and shorten trial duration. mathematical and statistical forecast models can inform the process of site selection, integrating available data sources and facilitating comparisons across locations. we recommend the use of ensemble forecast modeling – combining projections from independent modeling groups – to guide investigators identifying suitable sites for covid- vaccine efficacy trials. we describe an appropriate structure for this process, including minimum requirements, suggested output, and a user-friendly tool for displaying results. importantly, we advise that this process be repeated regularly throughout the trial, to inform decisions about enrolling new participants at existing sites with waning incidence versus adding entirely new sites. these types of data-driven models can support the implementation of flexible efficacy trials tailored to the outbreak setting. the covid- pandemic is a public health emergency, and there is an urgent need for effective vaccines to limit morbidity and mortality. efforts are underway to accelerate all steps in the vaccine development pathway ( ) . large randomized field trials are crucial for determining the safety and efficacy of candidates to inform regulatory decisions ( ) . in these trials, many thousands of eligible and consenting participants across multiple sites are enrolled and individually randomized to vaccine or control. these trials are event driven, where an expected primary endpoint is laboratory-confirmed symptomatic disease ( ) , with infection regardless of symptoms as a valuable secondary endpoint ( ) . selecting vaccine trial sites where disease incidence is highest during the study period can accelerate the accrual of endpoints. mathematical and statistical models are recognized as valuable tools for planning infectious disease clinical trials ( ) . they can be used to optimize design features such as cluster size or to examine the validity of the trial's statistical analysis ( ) . the use of spatially explicit forecast models to select vaccine trial sites was first explored during the - zika epidemic ( ) . these forecast models synthesize available data to make projections about which sites might have the highest future disease incidence. an important value of models is that they standardize projections across locations. trends in raw reported numbers of cases depend heavily on the sensitivity of the underlying surveillance system. case definitions and access to care and testing may vary over time and space. models that integrate many data sources, such as reported cases, test positivity, hospitalizations and deaths, can facilitate more meaningful comparisons across locations. forecasts provide estimates along with the uncertainty associated with those estimates to make best use of the available information. models can incorporate many features to capture the complex dynamics of infectious diseases. incidence is expected to vary widely over time and between locations, as a function of control measures in place, patterns of introduction, seasonality, and other sources of variability. mathematical models naturally account for prior circulation of the virus and any buildup of population-level immunity. areas that have already experienced substantial outbreaks may be less suitable for inclusion, and this would be reflected in projections. models can explicitly capture correlation due to movement between nearby sites or between sites and a common hub ( ) . models can also reflect relevant population-level features associated with expected incidence, such as density, race/ethnicity, age distribution, and educational status. we recommend the use of ensemble modeling, whereby multiple modeling groups prepare independent projections and these are combined to guide decision-making. individual models can be agent-based, compartmental, or statistical, can use different assumptions and data sources, but are all tasked with the same question of which sites are likely to have the highest disease incidence over a moderate time horizon. ensemble modeling has been shown to be more robust for complex systems than specialist models and better able to capture the complete range of possible outcomes ( ) . the strength of ensemble modeling has been shown for diseases like influenza ( ), dengue ( ) and ebola ( ) . ensemble modeling for covid- , like the covid- forecast hub, is similarly more robust ( ) . in addition to using forecast modeling for initial site selection, we propose that modeling be repeated at regular intervals throughout the trial. in the context of outbreaks, trials should be flexible to allow new sites to be added in response to evolving epidemiology ( ) . some sites will have lower than projected incidence during the trial period. for example, local policies or voluntary changes in behavior could effectively reduce transmission, meaning that the site is no longer a "hotspot." the modeling results can guide investigators deciding whether to continue to enroll new participants from existing sites, or to enroll new sites in emerging hotspots. in this paper, we describe a simplified framework for the use of ensemble modeling to guide the selection and continued evaluation of sites for a vaccine efficacy trial, with a focus on the covid- pandemic. individual modeling teams are welcome to contribute to the consensus model. we assume that models would already be built for general public health planning, so they would not be constructed only for this effort, though they may need to be modified. investigators can leverage existing groups or form new groups of modelers. participating modeling teams would be provided with a list of all candidate sites being explored. this list of sites may be based on previous engagement between the trial investigators and potential research partners. for a multi-country trial, this may include several sites per country from multiple countries. participating models must meet a minimum set of requirements. suggested guidelines are the ability to: (i) capture all geographic areas in the candidate list of sites, (ii) disaggregate to at least the first administrative level (e.g. state, province), though finer levels may be preferred for certain planning activities, (iii) project the covid- symptomatic cumulative incidence, i.e. the number of new symptomatic infections of any severity divided by the total population size during a pre-specified period (three months suggested), and (iv) produce a minimum of simulated epidemics. models must also be screened for internal consistency and basic plausibility when compared to historical trends. for each site, each model must generate a probabilistic predictive distribution for quantities of interest, such as the symptomatic cumulative incidence. these are bins of % width centered around whole numbers [ . , . %), [ . , . %), [ . , . %) and so on. the bin that includes % is narrower [ , . %). for each site, probabilistic predictive distributions are aggregated across models using stacking. figure describes a hypothetical model stacking procedure for a target site, per ray and reich ( ) . for simplicity and transparency, each model is assigned an equal weight, which is one over the number of models as done by the covid- forecast hub ( ) . if a participating team has developed more than one model, they must specify which model is primary and will contribute to the aggregate. more complex weighting schemes exist that preferentially weight models that performed best in previous rounds after an appropriate burn-in period ( ) . for each site, we can use the combined predictive distribution to produce summary statistics. suggested summary statistics are: (i) median incidence value, (ii) th percentile incidence value, (iii) th percentile incidence value, and (iv) probability incidence value is [ , . %) (probability of a very small or no outbreak). to present this information in a way that is easy for trial investigators to explore, we recommend reporting stacked projections, summary statistics, and basic information about the sites in an interactive tool, like the r shiny platform ( ) . this allows the end user to sort the table or select rows for closer examination. in this way, they could select a subset of "best rows" and view these together to approximate the formation of a trial. figure is a sample screenshot from such a program (code provided in supplemental materials). by generating a range of possible outcomes, models can capture the stochasticity of future transmission, including scenarios where incidence is much lower or much higher than the median projection. where incidence is highly variable with the potential to be very low, it may be preferable to include a larger number of sites to guard against the chance of accruing no efficacy data. the goal of ensemble modeling is to provide a simple and informative resource rather than a definitive recommendation. investigators will simultaneously consider many operational, political, and scientific factors. to provide context, we describe several other key considerations. to ensure a high-quality trial, sites should have adequate capacity for testing, safety monitoring, active surveillance, and high participant retention. nonetheless, sites with projected high incidence but poorer capacity should not be excluded if there is a potential role for mobile trial teams, as was used in the ebola ring vaccination trial in guinea ( ) . approval for the trial may be at the national or sub-national level, with flexibility to identify the particular target population when investigators are ready to start enrollment. for multi-country trials, investigators must weigh including multiple sites per country against including more countries. on one hand, given the complexity of country-specific procedures for approving clinical research, it may be easier to include multiple sites per country from fewer countries overall. on the other hand, the global community must ensure equitable access to potentially effective vaccines. broad representation also increases generalizability of the trial results, as it can best capture the effectiveness of vaccine candidates in diverse settings. these include variations in population age profile, race/ethnicities, climate, background presence of non-pharmaceutical interventions, and co-circulation of other coronaviruses. including many different geographic locations makes trials more robust to changes in the epidemic. while china was once the center of the covid- epidemic, several treatment trials initiated there were underpowered due to waning transmission ( ) . as other countries adopt more effective control strategies, incidence would likely decline, but it is less likely to wane in all areas, and new sites can also be added. experience with zika in the americas provides a useful counter-example, though, where trials were not possible because incidence dramatically declined everywhere ( ) . if that were to occur, the ensemble modeling process would be useful for assessing trial feasibility. finally, the ensemble modeling process should be evaluated by comparing model projections to subsequently observed data. an evaluation procedure could be conducted prior to each new round of modeling, before investigators want to make decisions about adding new trial sites. this process could assess how well model-projected rankings corresponded to observed rankings of hardest hit sites. where there is a lot of uncertainty in which sites will have highest incidence, as reflected in low correlation, investigators may feel more comfortable making future decisions based on logistical or political considerations rather than purely on model rankings. this evaluation procedure could also be conducted formally after the trial ends to compare model-projected and observed cumulative incidence and observed incidence during the target time periods. these types of reports are very useful for understanding the role of modeling as a tool for real-time decision-making in outbreaks ( , ) . we describe an ensemble modeling procedure to inform site selection for a vaccine efficacy trial planned during an ongoing epidemic. by prioritizing sites with highest projected disease incidence, investigators can accelerate the pace of endpoint accrual. mathematical and statistical models synthesize the best available evidence to guide this planning. we focus on covid- as a motivating example, but the general principles apply to other emerging infectious diseases. we present a highly simplified procedure to reduce the burden on modeling groups to prepare results and potentially enable more groups to participate. for example, models could, but not be required to, explicitly account for the impact of vaccination on transmission dynamics. the assumption is that population vaccine coverage will be relatively low even in large trials, and that the rank ordering of sites is similar in less complex models. this is a recommended minimum structure, but other relevant practical questions will likely emerge that can be explored as add-ons. for example, the modeling results can be used to answer questions about expected duration of the trial as a function of enrollment rates and expected incidence. nonetheless, it is important to remember that projections can be very uncertain, particularly as they depend upon rapidly changing policies and human behavior. thus, we focus on simple output for the purposes of prioritization, acknowledging that other important questions may be difficult to answer precisely. in addition to identifying geographic locations, models could also be used to explore targeted enrollment in sub-populations defined by age, occupation, or other covariates. models could also guide the design of post-licensure observational studies for continued evaluation of vaccine effectiveness. it is a top priority to rapidly evaluate the safety and efficacy of candidate covid- vaccines. data-driven models can help to optimize site selection and contribute to accelerating trials in a setting where every day counts. developing covid- vaccines at pandemic speed design of vaccine efficacy trials during public health emergencies world health organization. an international randomised trial of candidate vaccines against covid- antibody testing will enhance the power and accuracy of covid- -prevention trials simulations for designing and interpreting intervention trials in infectious diseases statistical power and validity of ebola vaccine trials in sierra leone: a simulation study of trial design and analysis preliminary results of models to predict areas in the americas with increased likelihood of zika virus transmission in spread of zika virus in the americas prediction of infectious disease epidemics via weighted density ensembles accuracy of real-time multi-model ensemble forecasts for seasonal influenza in the u.s an open challenge to advance probabilistic forecasting for dengue epidemics the rapidd ebola forecasting challenge: synthesis and lessons learnt ensemble forecasts of coronavirus disease (covid- ) in the u creating a framework for conducting randomized clinical trials during disease outbreaks ring vaccination with rvsv-zebov under expanded access in response to an outbreak of ebola virus disease in guinea, : an operational and vaccine safety report a trial of lopinavir-ritonavir in adults hospitalized with severe covid- demonstrating vaccine effectiveness during a waning epidemic: a who / nih meeting report on approaches to development and licensure of zika vaccine candidates mathematical modeling of the west africa ebola epidemic funding: this work was supported by the national institutes of health r -ai (ned, meh, iml, conceptualization, writing -review and editing. ☒ the authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.☐the authors declare the following financial interests/personal relationships which may be considered as potential competing interests: key: cord- - w de authors: sironi, manuela; cagliani, rachele; forni, diego; clerici, mario title: evolutionary insights into host–pathogen interactions from mammalian sequence data date: - - journal: nat rev genet doi: . /nrg sha: doc_id: cord_uid: w de infections are one of the major selective pressures acting on humans, and host-pathogen interactions contribute to shaping the genetic diversity of both organisms. evolutionary genomic studies take advantage of experiments that natural selection has been performing over millennia. in particular, inter-species comparative genomic analyses can highlight the genetic determinants of infection susceptibility or severity. recent examples show how evolution-guided approaches can provide new insights into host–pathogen interactions, ultimately clarifying the basis of host range and explaining the emergence of different diseases. we describe the latest developments in comparative immunology and evolutionary genetics, showing their relevance for understanding the molecular determinants of infection susceptibility in mammals. supplementary information: the online version of this article (doi: . /nrg ) contains supplementary material, which is available to authorized users. the way in which these analyses have helped to clarify the genetic determinants of species-specific infection and disease, as well as the reasons behind pathogen emergence. although arms races involve both the host and the pathogen, in this review we only focus on genetic diversity in mammalian hosts. host-pathogen genetic conflicts are not confined to mammals (and their pathogens): they drive molecular evolution in most realms of life, including bacterial-bacteriophage systems , plants and their infectious agents , as well as invertebrates and their pests , . although we review studies and methods (boxes - ) that analyse genetic diversity at the inter-species level, the investigation of intra-species and intra-population signatures of pathogen-driven selection has also provided extremely valuable insight into infectious disease susceptibility, especially in our species. the interested reader is directed towards several recent reviews for more information [ ] [ ] [ ] [ ] [ ] . the dynamics of host-pathogen interactions a central tenet of the red queen hypothesis is that organisms must continually adapt to survive and thrive in the face of continually evolving opposing organisms. nonetheless, evolution is not all about biotic interactions. at a macroevolutionary level, mixed models of evolution are likely to operate; biotic factors mainly shape species diversity locally and over short time spans, comparisons among species take a snapshot of selective events that have been unfolding over long timescales. most of these approaches use extant genetic diversity and phylogenetic relationships among species to infer underlying evolutionary patterns. briefly, inter-species approaches rely on the alignment of orthologous coding sequences, analyse these alignments site-by-site, and at each site determine which, among all possible substitutions, would be non-synonymous (amino acid replacing) or synonymous (non-amino acid replacing) (see the figure) . the observed number of non-synonymous differences per non-synonymous site (dn) and the observed number of synonymous differences per synonymous site (ds) are then estimated. under neutral evolution, the rate at which amino acid replacements accumulate is expected to be comparable to the rate for silent changes and, therefore, dn/ds should be equal to (green codons in the figure). nonetheless, most amino acid replacements are deleterious and, as a consequence, are eliminated by selection; this results in a large preponderance of sites with dn/ds < , a situation referred to as purifying (or negative) selection (shown in blue in the figure) . conversely, the selective pressure exerted, for instance, by a pathogen, may favour amino acid replacements (for example, changes that modify the sequence and structure of a cellular receptor): in this case, dn/ds may reach values greater than , a hallmark of positive (or diversifying) selection (red in the figure) . the figure shows a hypothetical example whereby a virus uses a cellular receptor to infect the host. to prevent viral binding and infection, selection favours variants that modify the sequence and structure of the host receptors; on the other side, the virus adapts to such changes by gaining mutations that keep re-establishing receptor binding. this process fuels a genetic conflict, which is evident at the interaction surfaces. some lineages may be under stronger selective pressure than others and may display lineage-specific selected sites (episodic selection; cyan). in this case the branch of the phylogeny leading to these species may show significant evidence of positive selection . whereas shifts in the physical environment (for example, climate changes and oceanographic and tectonic events) drive evolution at a large scale, across much longer time periods . recently, a new interpretation of the red queen hypothesis was proposed ; the analysis of several phylogenies from different taxa indicated that speciation mostly occurs at a constant rate through rare stochastic events that cause reproductive isolation . this view curtails the role of biotic interactions as major determinants of species diversity . despite these observations, the red queen hypothesis has proven to be an extremely useful framework for the study of host-pathogen interactions. in this context, red queen dynamics can be divided into different types (see ref. for a recent review). frequency-dependent selection, for example, determines allele frequency fluctuations in both host and pathogen populations. in this scenario, rare alleles are favoured by selection (the pathogen, for instance, may be adapted to the most common host genotype and may fail to infect hosts carrying a rare allele), and diversity within populations is maintained. escalatory arms races are another form of selection that usually apply to quantitative or polygenic traits and proceed through recurrent selective sweeps. selection results in an escalation in the phenotypes of both the host (for example, resistance) and the pathogen (for example, virulence). finally, in chase red queen scenarios the host is under pressure to reduce the strength of the interaction through de novo evolution of novelty, whereas the pathogen evolves to tighten the interaction by reducing phenotypic distance. chase scenarios occur when host-pathogen interactions have a complex genetic basis (polygenic); they determine selective sweeps and tend to reduce genetic diversity within populations. over the years, the red queen hypothesis has been supported by the description of rapid rates of evolution in genes involved in genetic conflicts and, in a few instances, by the temporal reconstruction of host-pathogen co-evolution in natural settings . more recently, the development of experimental evolution approaches has allowed its formal testing , . although extremely valuable, laboratory-based studies often use an isogenic host population that is infected by one or a few pathogen strains, and such studies only partially recapitulate the complex nature of host-pathogen interactions that occur in real life. for instance, phenotypic plasticity (an environmentally based change in the phenotype) and multiway host-pathogen interactions are common in nature. a remarkable example of phenotypic plasticity is the vertebrate adaptive immune system: through rearrangement and somatic hypermutation, the same genetic arsenal is used to combat a wide array of pathogens and to develop lifelong resistance to some infections. despite the relevance of adaptive immunity for host defence, its action does not preclude pathogen-driven selection at several genes involved in innate immunity or, more generally, in the interaction with pathogens (these represent the focus of this review). as for multiway interactions, these represent the norm: the same host can be infected by multiple pathogens (or even by multiple strains of the same infectious agent) during its lifetime, whereas pathogens differ in their ability to infect one or more host species. thus, multiple host-pathogen interactions might drive the evolution of the same or different molecular systems, blurring the expectations of the red queen hypothesis. finally, hosts with long generation times (such as mammals, which are the focus of this review), evolve at lower rates compared with most of their pathogens and also display smaller population sizes, resulting in an asymmetry of the arms race (although parasites with life cycles involving two or more species may be constrained in their ability to adapt (reviewed in ref. ) ). even in the presence of a strong selective pressure (for example, a fatal infection), several generations may be required before the molecular signatures of the genetic conflict can be detected in mammalian host genomes . nevertheless, natural selection signatures have been described at several mammalian genes that interact with recently emerged human infectious agents (for example, hiv- ), possibly as a result of the pressure imposed by extinct pathogens or because these agents have established long-lasting interactions with non-human hosts. the 'site models' implemented in the phylogenetic analysis by maximum likelihood (paml) package are widely used to infer positive selection and to identify positively selected sites. these models allow dn/ds to vary from site to site, assuming a constant rate at synonymous sites. data (alignment and phylogenetic tree) are fitted to models that allow (selection models) or do not allow (neutral model) a class of codons to evolve with dn/ds > . likelihood ratio tests are then applied to determine whether the neutral model can be rejected in favour of the positive selection model. if so, the gene is declared to be positively selected. also, if (and only if) the null hypothesis of neutral selection is rejected, a bayes empirical bayes (beb) approach can be used to detect specific sites targeted by selection (beb calculates the posterior probability that each site belongs to the class with dn/ds > ) , . the paml approach implicitly assumes that the strength and direction of natural selection is uniform across all lineages. because this is often not the case, murrell and co-workers recently developed the mixed effects model of evolution (meme, hyphy package) . meme allows the distribution of dn/ds to vary from site to site and from branch to branch; thus, the method has greater power to detect episodic selection, especially if it is confined to a small subset of branches in the phylogeny. a major issue related to these approaches is their extreme sensitivity to errors in sequence (coverage), annotation and alignment. misalignments and incorrect sequence information may result in apparently fast evolutionary rates and thus inflate the false-positive rate [ ] [ ] [ ] . the use of specific alignment algorithms (for example, prank) and filtering procedures (for example, guidance) may partially overcome this problem . likewise, genetic variability that is generated by recombination can be mistaken for positive selection . thus, to limit false positives, alignments should be screened for recombination before running positive selection tests (and, if necessary, split on the basis of recombination breakpoints) or recombination should be incorporated into the model. the accumulation of favourable amino acid-replacing substitutions, which results in more non-synonymous changes than expected under neutrality (dn/ds > ). coronavirus (mers-cov) as a dangerous human pathogen. both ebov and mers-cov are thought to have originated in bats and spread to humans either directly or through an intermediate host. because eids are almost inevitably caused by an existing pathogen that adapts to infect a new host, comparative analyses of different species may help to unveil the genetic and immunological determinants underlying pathogen spillover and infection susceptibility. hiv- , for example, originated from the crossspecies transmission of the simian immunodeficiency virus siv cpz , which naturally infects chimpanzees . old world monkeys are resistant to hiv- infection owing to a post-entry viral block operated by cellular restriction factors. this differential susceptibility to infection was exploited to isolate tripartite motif-containing protein (trim ; also known as trim α), a major retrovirus restriction factor, from a rhesus macaque cdna library . the protein product of trim binds directly to the incoming viral capsid and targets it for disassembly. whereas macaque trim is highly efficient against hiv- , the human protein is not . most species-specific determinants of antiviral activity were mapped to a short amino acid stretch in the so-called b . (or spry) domain of trim (ref. ). in primates, this region has evolved under positive selection, and the human lineage shows some of the strongest selection signatures . why then is human trim so highly inefficient against hiv- ? possibly because the human gene evolved to fight another retrovirus. in a seminal paper, kaiser and co-workers resurrected an extinct pan troglodytes endogenous retrovirus (pterv ) and showed that the amino acid status of a single residue in the trim b . domain modulates its activity against pterv and hiv- , with the gain of restriction for one virus resulting in decreased control of the other one . human trim is very active against pterv , suggesting that our ancestors adapted to fight this virus or some related retrovirus, and this left them (us) unprepared against the hiv- epidemic. more recently, several genes identified as hiv- host factors were analysed in primates, and evidence emerged of positive selection at five of these (ankyrin repeat domain a (ankrd a), cd , microtubule-associated protein (map ), nucleoporin kda (nup ) and ran binding protein (ranbp )) . importantly, most of the positive selection targets in cd , map and nup are located in protein regions or domains that are responsible for direct interaction with the virus. the authors suggested that the selective pressure on these genes was exerted by ancient lentiviruses , . overall, a number of concepts can be taken from these studies: past infection events may leave a signature that affects the ability of extant species to fight emerging pathogens. evolution may act through trade-offs, whereby changes that are favourable in one specific environment (in this case, the presence of a specific pathogen) may be unfavourable when conditions change. protein regions at the host-pathogen interface are expected to be targeted by the strongest selective pressure. evolutionary studies based on inter-species comparisons allow the identification of molecular determinants of infection susceptibility at single amino acid resolution. mammals display different susceptibility to distinct pathogens, and infection with the same agent can have extremely different outcomes in diverse species (see ref. for a recent review). thus, domestic and wild mammalian (and non-mammalian) species represent natural reservoirs of human pathogens and/or may provide the adaptive environment for pathogen spillover. because host reservoir species and their pathogens often signatures of selection along specific branches can be detected through the so called 'branch-site' models implemented in the phylogenetic analysis by maximum likelihood (paml) package . in analogy to the site models described in box , alignment errors result in high false-positive rates when branch-site models are applied ; this issue can be partially mitigated by the use of specific aligners . branch-site models require the phylogeny to be divided into 'foreground' and 'background' branches. a likelihood ratio test is then applied to compare a model that allows positive selection on a class of codons for the foreground branches with a model that does not allow such selection . designation of the foreground branches needs a priori information, possibly based on biological evidence. if no clues are available as to which branches are more likely to have undergone selection, it is still possible to run the analysis by designating each branch of the tree as 'foreground'; this generates a multiple-hypothesis testing problem that must be appropriately corrected . two alternative methods can detect selection at specific lineages without a priori branch partition. the branch site-random effects likelihood (bs-rel) method considers three different evolutionary scenarios (purifying, neutral and diversifying selection) for all branches in a given tree, and each branch is considered independently from the others; the algorithm applies sequential likelihood ratio tests to identify branches with significant evidence of positive selection . the second method, the covarion-like codon model (fitmodel) , allows each site to switch between selective regimes at any time on the phylogeny. thus, switches are not necessarily associated with tree nodes. recently, this approach was shown to be more powerful than the branch-site tests if a priori information is available . both fitmodel and the paml branch-site methods envisage a bayesian approach to identify sites evolving under episodic positive selection. however, extensive simulations revealed that the branch-site approach is accurate but has limited power at detecting sites . this problem has been referred to as the 'selection inference uncertainty principle' -that is, it is difficult to simultaneously infer both the site and the branch that are subject to positive selection . co-evolve for millions of years, evolutionary analyses may help to explain host adaptive events associated with low susceptibility and mild disease outcomes. the most extensive body of knowledge on host-pathogen specificity focuses on viral infections, as the example of trim mentioned above testifies, but recent work has also shed new light on bacterial diseases. leptospirosis, one of the most prevalent human bacterial zoonoses worldwide, is caused by bacteria of the leptospira genus. wild rodents are considered to be the main reservoirs for human leptospirosis, but a study of malagasy small mammals indicated that several endemic species of tenrecs and bats are also infected with leptospira species that are markedly specific to their hosts, suggesting long-term adaptation of the bacterium to different hosts . a feature that pathogenic leptospira species share with other bacteria is complement evasion. indeed, these spirochetes have evolved different strategies to elude complementmediated killing; thus, leptospiral immunoglobulin-like (lig) proteins can bind complement factor h (cfh) and c b-binding protein (c bp) to mediate complement inactivation at the bacterial surface. a genome-wide analysis of positive selection in six mammalian species indicated that the complement system has been the target of extremely intense selective pressure . similar results were obtained by analysing positively selected genes in the bat myotis brandtii . thus, selection-driven speciesspecific differences at complement genes might explain differential susceptibility to infections. in line with this view, human-specific pathogens such as neisseria gonorrhoeae and neisseria meningitidis bind cfh of human origin, but not cfh from other primates, and a single amino acid change (n r) in the chimpanzee molecule restores cfh binding to sialylated gonococci and bacterial killing . several sequenced mammalian genomes are now available; it will be important to study the detailed pattern of molecular evolution at complement genes, with the aim of gaining insight into the determinants of species-specific complement evasion. yersinia pestis provides another remarkable example of differential susceptibility to a bacterial infection. again, rodents act as a natural reservoir for this human pathogen. as with other gramnegative bacteria, lipid a, the biologically active component of y. pestis lipopolysaccharide (lps), is recognized by toll-like receptor (tlr ) and its co-receptor lymphocyte antigen (ly ; also known as md ) (see below). recent data showed that, compared with mouse cells, human cells respond less efficiently to hypoacylated lipid a; this effect is almost entirely due to differences in tlr and ly sequences, as assessed by the generation of humanized mice . different responsiveness to variably acylated lps from other sources (for example, escherichia coli) had previously been described . starting from this premise, ohto and co-workers solved the crystal structure of the mouse tlr -ly -lps and tlr -ly -lipid iva (a synthetic tetra-acylated lipid a precursor) complexes and compared them to the human counterparts. structural differences were detected in the interaction of lipid iva with the two mammalian receptors, with some amino acid replacements in ly and tlr possibly being responsible for the observed differential binding . analysis of tlr in mammals revealed that the receptor has evolved adaptively . we mapped positively selected sites onto the structure of the human and mouse complexes and observed that some of these may indeed account for structural differences between humans and mice (fig. ) . rodents are the most established animal model for human disease, including for susceptibility to infection. in recent years, however, technological advances have made the sequencing of whole genomes a relatively quick and inexpensive process. the genome sequences of non-model mammals that serve as natural reservoirs of human infectious agents are now available, allowing the unprecedented opportunity to exploit these data for molecular evolution studies. bats, for example, are known to host a wide range of viruses that are highly pathogenic to humans . the genomes of six bat species have been sequenced so far, and three of these (m. brandtii, pteropus alecto and myotis davidii) were analysed in detail to unveil the evolutionary history of specific traits . results showed that different families of immune receptors -including killer cell immunoglobulin-like receptors (kirs), killer cell lectin-like receptors (klrs), sialic acid-binding immunoglobulin-like lectins (siglecs) and leukocyte immunoglobulin-like receptors (lilrs) -have expanded or contracted in distinct bat species. also, in these three bat species, as well as in the common ancestor of p. alecto and m. davidii, genes involved in immunity represented preferential targets of positive selection . this is not unexpected: immune-response genes have been shown to have evolved rapidly in most mammalian species analysed to date . thus, although these sequenced bat genomes have not yet provided an explanation as to why bats are tolerant to ebov, for instance, they pave the way for further analyses to test specific hypotheses and/or to address the molecular determinants of host-pathogen interactions. in a recent study, demogines and co-workers showed how this can be accomplished. the authors focused on angiotensinconverting enzyme (ace ), which serves as a receptor for severe acute respiratory syndrome coronavirus (sars-cov) cell entry. in particular, the receptorbinding domain of the viral spike protein is responsible for ace binding and is a major determinant of host range . although the human sars epidemic was suggested to have originated from the zoonotic transmission of sars-cov from bats to humans, possibly via an intermediate host (for example, palm civets) , , no ace -binding sars-cov-like virus had been identified in bats when demogines and collaborators started their work . the authors analysed ace genes in bat species, and results revealed that the gene evolved adaptively and that the positively selected residues of the bat genes map at the ace -sars-cov interaction surface (fig. ) . positive selection localized to a subset of sites or confined to a few species in a phylogeny. these data led to the conclusion that ace -binding coronaviruses originated in bats . this finding was confirmed in a subsequent study that isolated an ace -binding sars-like coronavirus from horseshoe bats in china , highlighting the power of evolutionary studies in predicting host range and disease emergence. similarly to sars-cov, mers-cov is thought to have originated in bats and to have spread to humans via an intermediate host, possibly dromedary camels . infection is initiated by binding of the mers-cov spike protein to human dipeptidyl peptidase (dpp ; also known as cd ) . recent data indicate that five amino acids in dpp that differ between humans (mers-cov susceptible) and hamsters (non-susceptible) are key determinants for host specificity (fig. ) . we extended a previous evolutionary analysis of mammalian dpp (ref. ): strong evidence of positive selection was found with episodic selection in the vespertilionidae bat family and the panda and ferret branches, as well as in the dog lineage (fig. ; see supplementary information s ,s (box, table)). as shown in fig. , most positively selected sites are located at the dpp -spike protein interaction surface , and one of these is among those described as binding determinants . thus, as observed for ace , mers-cov and related viruses (for example, coronavirus hku ) are likely to act as drivers of molecular evolution on mammalian dpp genes; it will be especially interesting to evaluate the contribution of positively selected sites in ferrets because these animals are resistant to mers-cov infection. immune responses in mammals are highly coordinated processes involving multiple systems that sense infection, activate antiviral and antimicrobial responses, and trigger adaptive immunity. the evolutionary history of several such systems has been analysed in detail, and below we describe the most recent findings. innate immune receptors. the mammalian immune system is endowed with a repertoire of molecular sensors called pattern-recognition receptors (prrs). these molecules detect pathogen-associated molecular patterns (pamps) and initiate a downstream signalling cascade that culminates in the production of cytokines and antimicrobial factors. the main families of prrs include tlrs, nod-like receptors (nlrs), rig-like receptors (rlrs) and aim -like receptors (alrs). in the host-pathogen arms race, these molecules represent one of the foremost detection-defence systems; consistently, several studies have reported adaptive evolution at genes encoding mammalian prrs. analyses in primates, rodents and representative mammalian species indicate that positive selection shaped nucleotide diversity at most tlrs, with the strongest pressure acting on tlr (refs , , ) . similarly to tlr (fig. ) , several positively selected sites in other tlrs are located in pamp-binding regions, raising questions as to whether species-specific host-pathogen co-evolution is occurring, and how these sequence changes translate into differential pamp recognition. in fact, as mentioned above for lps, species-specific differences in ligand binding by tlrs seem to be common and potentially affect the overall immune response to specific pathogens . integration of evolutionary, immunological and genetic studies will be instrumental in the future for medical applications, especially in light of nature reviews | genetics sites that are positively selected in mammals are mapped onto the tlr structure (red): several of these flank or correspond (orange) to residues that differ between humans and mice and that surround the phosphate groups of lipid iva (yellow) . if lys and arg are replaced with the human residues (glu and gln , respectively), the responsiveness of mouse tlr -ly to lipid iva is abolished. b | structures of human cd (white; transmembrane and juxtamembrane region) and mir (grey; encoded by kaposi sarcoma-associated herpesvirus). cd sites that are involved in the interaction and that are positively selected in mammals are shown in red. c | complex of transferrin receptor protein (tfr ) with the surface glycoprotein (gp ) of machupo virus (macv), a rodent arenavirus that can also infect humans through zoonotic transmission. tfr residues involved in the interaction with gp are in yellow, positively selected sites are in red and positively selected sites that directly interact with gp are in orange. the proposed use of tlr ligands as vaccine adjuvants, a step that may require tailoring to distinct species . compared with tlrs, mammalian alrs are much less conserved and more dynamic, with distinct species carrying different sets of functional genes (ranging from in mice to none in some bats) , . as a consequence, analysis of several mammals indicated that, with the exception of absent in melanoma (aim ), which is non-functional in several species, no unequivocal orthologues can be inferred for the remaining alr genes. this prevents the application of standard codon-based tests across the entire mammalian phylogeny, although closely related species can be analysed. thus, interferonγ-inducible protein (ifi ) and aim were shown to have evolved under positive selection in primates. positively selected sites were observed to mainly localize near to regions or domains involved in dna binding and protein-protein interaction, suggesting modulation of substrate specificity or genetic conflicts with viral inhibitors . positive selection was also described for the three mammalian rlrs (retinoic acid-inducible gene i (rigi; also known as ddx ), melanoma differentiation-associated (mda ; also known as ifih ) and lgp (also known as dhx )), the primate nlr family apoptosis inhibitory protein (naip) and rodent naip genes , . indeed, as is the case for alrs, rodents have multiple naip paralogues that show widespread evidence of inter-locus recombination. this led to the application of a dn//ds sliding window approach: the naip sites evolving with dn/ds > were found to be located in the bacterial ligand domain . studies on antiviral restriction factors have been extensive because these molecules represent obvious targets in hostpathogen arms races. specifically, genetic conflicts between host restriction factors and viral components often play out in terms of binding-seeking dynamics (the host factor adapts to bind the viral component) and binding-avoidance dynamics (the virus counter-adapts to avoid binding and restriction by the host factors). the evolutionary history of antiviral restriction factors has been comprehensively reviewed elsewhere [ ] [ ] [ ] , and we only highlight a few recent developments here. the first restriction factor to be identified was the product of the mouse gene friend virus susceptibility (fv ), a protein that protects against murine leukaemia virus (mlv) infection . the origin and evolution of fv is extremely interesting: early sequence analysis revealed that it derives from the gag gene of an ancient endogenous retrovirus that is not directly related to mlv . thus, fv exemplifies a paradoxical twist of the arms race scenario whereby a viral gene is co-opted by the host to serve an antiviral function (this is not the only instance, see ref. ) . recent results showed that the fv gene was inserted into the mouse genome between million and million years ago, long before the appearance of mlv. thus, the selective pressure exerted by other viruses must have maintained fv function and driven its evolution . indeed, analysis of fv from wild-type mice indicates that different fv products can recognize s (box, table) ). genes that evolved from a common ancestral gene through speciation. homologous genes created by a duplication event within the same genome. the observed number of non-synonymous substitutions per non-synonymous site. the observed number of synonymous substitutions per synonymous site. and block multiple genera of retroviruses (for example, equine infectious anaemia virus and feline foamy virus), and a number of positively selected sites in the carboxy-terminal region of fv are directly involved in restriction specificity . thus, in a similar way to trim , fv was identified for its ability to restrict an extant virus, but its evolution was driven by different waves of retroviral species, some of which are likely to be extinct. other restriction factors that have been the topic of recent investigation are encoded by two paralogous genes, myxovirus resistance (mx ; also known as mxa) and mx (also known as mxb). the protein products of the two genes display high sequence identity but different antiviral specificity. mx has broad activity against rna and dna viruses. recently, mitchell and collaborators showed the potential of evolutionary analyses to generate experimentally testable hypotheses on the nature of genetic changes that affect species-specific susceptibility to infection. the authors applied an evolution-guided approach and identified a cluster of positively selected residues in an unstructured surface-exposed mx loop (loop ), which confers antiviral specificity; genetic variation in loop is a major determinant of mx antiviral activity against thogoto and avian influenza a viruses, and replacements at a single positively selected site alter the ability of mx to restrict these pathogens . more recently, the selection pattern at the mx gene, which encodes an antiretroviral effector , was shown to parallel that of mx , with most selected sites located in loop (ref. ). in mx , sites selected in the primate lineage were detected outside loop , and mx also showed evidence of selection in other domains , ; these sites are promising candidates for being additional determinants of antiviral activity. antigen presentation and the ensuing t cell activation are central processes in mammalian cellmediated immune response (fig. ) . therefore, a convenient strategy for pathogens to elude immune surveillance is to hijack the molecular pathways responsible for these processes , . in line with the arms race scenario, there is evidence of positive selection at several mammalian genes involved in antigen presentation and the regulation of t cell activation , (fig. ) . the pathogen-driven mechanisms underlying evolution at these genes are likely to be manifold. one mechanism is genetic conflict with a pathogen-encoded component, evidence of which can be seen in the protein cd . positively selected sites in the transmembrane and juxtamembrane region of cd interact with mir (fig. ) , a kaposi sarcomaassociated herpesvirus (kshv) protein that downmodulates cd expression , . a second mechanism is the use of cell-surface molecules as viral receptors: some adenovirus strains, for example, have been reported to exploit cd and cd for cellular attachment , . a third mechanism is the broadening or tuning of the host's ability to process and present pathogen-derived components. for example, a positively selected site in the carbohydrate-recognition domain of cd (also known as langerin; a birbeck granule molecule) affects an amino acid position that is directly involved in the binding of pathogen-derived glycoconjugates . these mechanisms are not mutually exclusive. for example, a plethora of viral pathogens (such as herpes simplex virus , human papillomavirus, hiv- and kshv) interfere with cd d trafficking and recycling , . as a consequence, the cytoplasmic and transmembrane regions of cd d display positively selected sites, one of which is within a primate-specific trafficking signal. additional positively selected sites are located in the cd d extracellular region and flank the t cell receptor interaction surface and the lipid-binding pocket, which suggests that they exert an effect on antigen-binding specificity . finally, we draw attention to one of the few attempts at assessing the part that helminth infections have played as selective pressures for mammals and at integrating epidemiological information into molecular evolutionary approaches. machado and co-workers found evidence of positive selection at the mammalian gene fc fragment of igg, low affinity iiib, receptor (fcgr b), which is expressed by eosinophils and is involved in the binding of immunoglobulin g (igg)-coated parasites. notably, the authors also tested a specific hypothesis whereby mammalian lineages hosting a wider range of helminth species should show stronger evidence of selection compared with other species (this was accomplished by running the phylogenetic analysis by maximum likelihood (paml) branch-site models with helminth-rich lineages as foreground branches . their hypothesis was verified, providing a plausible explanation for the evolutionary pattern at fcgr b and suggesting that similar approaches may be used to detect other mammalian genes involved in genetic conflicts with helminth parasites. as exemplified by ace , host-pathogen interactions are not limited to immune system components. the reasons why genes with no specific defence function may be targeted by the selective pressure imposed by infectious agents are manifold. the best known instances probably refer to gene products that act as incidental receptors for pathogens, as is the case with ace . other host gene products that engage in genetic conflicts include those that participate in the coagulation cascade and the contact system, which are commonly hijacked by bacterial pathogens to promote tissue invasion or to elude detection by immune cells (see ref. for a review). an alternative possibility is that the host builds a line of defence based on the sequestration of essential micronutrients from the pathogen, a phenomenon known as 'nutritional immunity' . housekeeping genes. incidental receptors are often represented by the products of housekeeping genes, which are typically expressed at high levels by different cell types. among these, the transferrin receptor (tfrc) gene encodes a cell-surface molecule (transferrin receptor protein (tfr )) that is essential for iron uptake. tfr is used as a receptor by mouse mammary tumour virus, canine parvovirus and rodent new figure | genes involved in antigen processing and presentation and t cell regulation are common targets of positive selection in mammals. all pathway components are designated using official gene names (excluding the major histocompatibility complex (mhc) and t cell receptor (tcr)) and are highlighted in red if they are targets of positive selection in mammals or primates , , . the molecular components of different antigen processing and presentation pathways are shown (details from refs , ) to provide a general overview of the extent of positive selection and to highlight the function of positively selected genes, as most of their protein products directly interact with the antigen. thus, the figure is not meant to show all molecules involved in the process or to convey mechanistic insights. also, some genes may show tissue-specific expression or may be induced under specific circumstances: their products are nonetheless included for the sake of completeness. as for t cell regulatory molecules, the representation does not reflect the stoichiometry of binding (for example, cd functions as a dimer). notably, the same molecule may be expressed by different populations of t cells, although here each molecule is shown on one t cell type only (to avoid redundancy). the dashed arrows and '?' indicate steps that lack clear molecular definition or are only inferred. the orange circles, and red and blue shapes at the bottom of the figure represent proteolytic fragments. b m, β -microglobulin; blmh, bleomycin hydrolase; calr, calreticulin; cd lg, cd ligand; ctla , cytotoxic t lymphocyte protein ; cts, cathepsin; cyb, cytochrome b; erap, endoplasmic reticulum aminopeptidase; havcr , hepatitis a virus cellular receptor ; hla-dm, major histocompatibility complex, class ii, dm; icos, inducible t cell co-stimulator; icoslg, icos ligand; ifi , interferonγ-inducible protein ; inkt, invariant natural killer t; itcr, invariant tcr; lgmn, legumain; lnpep, leucyl-cystinyl aminopeptidase; ncf, neutrophil cytosol factor; npepps, puromycin-sensitive aminopeptidase (also known as psa); nrd , nardilysin; pdcd , programmed cell death ; pdcd lg , programmed cell death ligand ; pdia , protein disulfide-isomerase a ; ros, reactive oxygen species; tap, antigen peptide transporter; tapbp, tap-binding protein (also known as tapasin); thop , thimet oligopeptidase ; tpp , tripeptidyl-peptidase . the elimination of deleterious amino acid-replacing substitutions, which results in fewer non-synonymous changes than expected under neutrality (dn/ds < ). world arenaviruses. in line with the arms race scenario, tfrc evolved adaptively in rodents and caniforms, and positively selected sites are mainly located in the extracellular domain regions that interact with rodentinfecting arenaviruses (fig. ) and carnivore-infecting parvoviruses, respectively , . interestingly, positive selection at the primate transferrin (tf) gene, which encodes the tfr ligand, was also recently described ; in this case, selection is driven by bacteria, not viruses . transferrin is the major circulating iron transporter in mammals and is also thought to participate in nutritional immunity by sequestering iron from bacteria. consistently, most positively selected sites were found to have evolved to counteract binding by bacterial transferrin surface receptors that scavenge host iron . thus, different selective pressures exerted by distinct molecular mechanisms contributed to shaping the evolution of a central homeostatic process -in this case, iron transport in mammals. another housekeeping gene product that acts as a viral receptor is niemann-pick c protein (npc ), a sterol transporter located in the membrane of late endosomes and lysosomes. npc is expressed by most cell types and is used by filoviruses (such as ebov and marburg virus). evolutionary analysis of mammalian npc genes indicated that three positively selected residues are located in the amino-terminal portion of the second npc luminal loop; binding of this loop by the ebov glycoprotein (gp) is necessary and sufficient for the viral receptor activity of the sterol transporter , (fig. ) . the second luminal loop of npc is also bound with high affinity by the gp encoded by lloviu virus, a bat-derived, ebov-like filovirus . thus, npc may represent a universal receptor for filoviruses, and the constant selective pressure exerted by such viruses might have greatly contributed to shaping mammalian genetic diversity at loop . these data may have great and immediate practical values. in fact, small molecules that directly target npc and disrupt gp binding are regarded as possible therapeutic compounds against ebov [ ] [ ] [ ] (fig. ) . because mammalian npc diversity at the interaction surface is driven by selection, future efforts in this direction are likely to benefit from the incorporation of evolutionary analysis; this would be especially important when testing therapeutic molecules on model organisms and non-human mammals. in humans, mutations in npc cause niemann-pick disease type c , a progressive neurodegenerative condition. this is in line with the central role of this transporter in housekeeping functions; thus purifying selection. is expected to constrain variation in the gene. indeed, the human-mouse dn/ds calculated for the npc whole-gene region is definitely lower than , as is the case for most genes (fig. ) . in fact, mammalian npc genes show a large preponderance of codons evolving with dn/ds < , and positive selection is extremely localized in loop (fig. ) . this specific example illustrates a general concept, whereby molecules involved in central homeostatic processes may be engaged in genetic conflicts with pathogens, although in several instances the sequence space accessible for adaptive mutation without a high fitness cost is expected to be limited. the coagulation cascade and contact system. as anticipated above, several components of the coagulation cascade and contact system evolved adaptively in mammals, most likely as a result of genetic conflicts with bacterial pathogens , . for instance, staphylococcus aureus is endowed with an arsenal of proteins that target such systems, including two cysteine proteinases (scpa and sspb) that cleave plasma kininogen at each terminal side of the bradykinin domain to generate kinins, with a consequent increase of vascular leakage . these events are central for bacterial virulence and are linked to the pathogenesis of sepsis. in kininogen (kng ), positively selected sites are located in all domains, with the exception of the highly conserved bradykinin region . one of the positively selected sites defines the n-terminal cleavage site of scpa and sspb, suggesting that sites flanking the bradykinin sequence are evolving to avoid recognition and cleavage by bacterial-encoded proteases. in analogy to the strong purifying selection acting on the bradykinin region, analysis of calculation cascade genes indicated that disease-causing mutations are more likely to occur at sites targeted by purifying selection and are rarer at positively selected sites . again, these observations highlight the coexistence of distinct selective regimes at the same gene regions and exemplify the concept of evolutionary trade-offs. the advent of high-throughput sequencing technologies has allowed for the generation of an unprecedented wealth of genetic data, including the whole-genome sequences of host reservoir species for human pathogens, as well as genetic information for multiple microbial and viral species and strains. moreover, large-scale approaches such as rna interference and mass spectrometry are providing detailed pictures of host-pathogen interactomes , . finally, an increasing number of crystal structures of interacting host and pathogen proteins solved in complex are available, allowing the opportunity to determine the structural basis of these interactions to identify regions or amino acids that lie at the host-pathogen contact surface. integration of these data with evolutionary analysis will allow the testing of specific hypotheses, including which species have responded to the pressure exerted by one or more pathogens (see the sars-cov example), which molecules and residues have participated in the arms race and which host-pathogen interacting partners are expected to have co-evolved. these advances are also expected to progressively change evolutionary genetics from a hypothesis-driven to a hypothesis-generating discipline. in this respect, we note that although the arms race scenarios we have described in this review imply some form of host-pathogen co-evolution over time, the nature of the interaction and its dynamics have often been inferred from the observed pattern of variation. indeed, the fact that the same residues that affect specific host-pathogen interactions are targeted by positive selection does not necessarily imply a causal link, and in many instances the specific selective agents underlying molecular adaptations remain to be determined. as shown above, these may well be accounted for by extinct pathogens or by agents that had a major co-evolutionary role in the past but that are now fading away from the landscape of common infections. with a few exceptions , , evolutionary studies only investigate extant genetic variation and modern pathogens, with little reconstruction of past events. nevertheless, we do not necessarily need to go back in time: evolutionary analyses can be used as predictive tools to pinpoint which genes and residues are more likely to contribute to present-day host-pathogen interaction and help explain species-specific susceptibility nature reviews | genetics the values for some of the genes discussed in this review are indicated. data were derived from the ensembl biomart database (see further information). b | natural selection acting on mammalian niemann-pick c (npc ) genes. npc is shown with its predicted membrane topology and protein regions coloured in hues of blue that represent the percentage of negatively selected sites (as detected by the single-likelihood ancestor counting method using datamonkey); the darker the blue, the higher the percentage. the location of three positively selected residues (red) is indicated on the left, and an alignment of the corresponding region is shown on the protein to the right (with red and blue representing positively and negatively selected sites, respectively). the interaction with the glycoprotein (gp; green) of filoviruses (such as ebola virus, marburg virus or lloviu virus) is shown. gp binds npc after processing by cellular proteases. ace , angiotensin-converting enzyme ; darc, duffy blood group, atypical chemokine receptor; mx , myxovirus resistance ; ssd, sterol-sensing domain; tfrc, transferrin receptor; tlr , toll-like receptor . to infection. several studies mentioned above, including those investigating selection at mx (ref. ), tfrc (refs , ) , tf and other protein-coding genes , , , , used experimental analyses to show that evolutionary information can indeed be exploited to gain highresolution insight into the molecular determinants of binding affinities at host-pathogen interfaces. the studies of iron transporters hold particular value because the authors analysed the genetic variability of both the host and the pathogen and showed that both parties evolved in response to mutually exerted pressures, in line with the red queen principles. so far, few attempts have been made at integrating evolutionary analyses of host and pathogen interacting partners into a common framework. however, efforts in this direction hold the promise of improving our understanding of the strategies used by both hosts and pathogens to adapt and counter-adapt. in turn, this knowledge has possible biomedical and therapeutic implications, given the ability of different pathogens or distinct strains of the same infectious agent to elude not only natural host defences but also drugs and vaccination strategies. as a final note, we mention that we have exclusively focused on adaptive events involving coding gene regions. nevertheless, several recent studies (see ref. for a review) have highlighted the role of non-coding variants as important determinants of susceptibility to infection within species. thus, host-pathogen conflicts are more than likely to have contributed to adaptive evolution at regulatory elements during speciation. detection of these adaptive events will benefit from the availability of high-throughput techniques (for example, rna sequencing and chromatin immunoprecipitation followed by sequencing) and the development of methodological approaches for dissecting molecular evolution in non-coding regions; notably, recent data have shown the usefulness of a framework similar to dn/ds to analyse the evolutionary history of mammalian transcriptional enhancers . application of this methodology (or extensions thereof) to the study of host-pathogen interactions will provide valuable information on which non-coding sequence changes have been targeted by selection and thus modulate susceptibility to infection or related phenotypes. signatures of environmental genetic adaptation pinpoint pathogens as the main selective pressure through human evolution a new evolutionary law running with the red queen: the role of biotic conflicts in evolution the causes of evolution (longmans, green, & co, ) revenge of the phages: defeating bacterial defences genomic variability as a driver of plant-pathogen coevolution? mainstreaming caenorhabditis elegans in experimental evolution insights from natural host-parasite interactions: the drosophila model from evolutionary genetics to human immunology: how selection shapes host defence genes population genetic tools for dissecting innate immunity in humans the red queen's long race: human adaptation to pathogen pressure human genome variability, natural selection and infectious diseases natural selection and infectious disease in human populations the red queen and the court jester: species diversity and the role of biotic and abiotic factors through time phylogenies reveal new interpretation of speciation and the red queen host-parasite 'red queen' dynamics archived in pond sediment a matching-allele model explains host resistance to parasites antagonistic coevolution accelerates molecular evolution biological and biomedical implications of the co-evolution of pathogens and their hosts global trends in emerging infectious diseases origins of hiv and the aids pandemic. cold spring harb the cytoplasmic body component trim α restricts hiv- infection in old world monkeys positive selection of primate trim α identifies a critical species-specific retroviral restriction domain restriction of an extinct retrovirus by the human trim α antiviral protein positive selection of primate genes that promote hiv- replication hiv- capsid-cyclophilin interactions determine nuclear import pathway, integration targeting and replication efficiency this is an excellent review highlighting the importance of non-model organisms in understanding zoonotic infections diversification of an emerging pathogen in a biodiversity hotspot: leptospira in endemic small mammals of madagascar patterns of positive selection in six mammalian genomes genome analysis reveals insights into physiology and longevity of the brandt's bat myotis brandtii this work helps to clarify the species specificity of n. gonorrhoeae infection by analysing the binding of sialylated gonococci to human and chimpanzee cfh humanized tlr /md- mice reveal lps recognition differentially impacts susceptibility to yersinia pestis and salmonella enterica lipid a modification systems in gram-negative bacteria structural basis of species-specific endotoxin sensing by innate immune receptor tlr /md- signatures of positive selection in toll-like receptor (tlr) genes in mammals mass extinctions, biodiversity and mitochondrial function: are bats 'special' as reservoirs for emerging viruses? an extremely interesting study providing an overview of the evolutionary history of three bat genomes, with possible implications for immunity-related evidence for ace -utilizing coronaviruses (covs) related to severe acute respiratory syndrome cov in bats a good example of how evolutionary studies can provide insight into host range and disease emergence retargeting of coronavirus by substitution of the spike glycoprotein ectodomain: crossing the host cell species barrier isolation and characterization of viruses related to the sars coronavirus from animals in southern china severe acute respiratory syndrome coronavirus-like virus in chinese horseshoe bats isolation and characterization of a bat sars-like coronavirus that uses the ace receptor full-genome deep sequencing and phylogenetic analysis of novel human betacoronavirus a central work showing that dpp of human and bat origin acts as a functional receptor for mers-cov host species restriction of middle east respiratory syndrome coronavirus through its receptor, dipeptidyl peptidase adaptive evolution of bat dipeptidyl peptidase (dpp ): implications for the origin and emergence of middle east respiratory syndrome coronavirus molecular basis of binding between novel human coronavirus mers-cov and its receptor cd adaptation and constraint at toll-like receptors in primates contrasted evolutionary histories of two toll-like receptors (tlr and tlr ) in wild rodents (murinae) variation matters: tlr structure and species-specific pathogen recognition extensive evolutionary and functional diversity among mammalian aim -like receptors ancient and recent selective pressures shaped genetic diversity at aim -like nucleic acid sensors rig-i-like receptors evolved adaptively in mammals, with parallel evolution at lgp and rig-i molecular basis for specific recognition of bacterial ligands by naip/nlrc inflammasomes rules of engagement: molecular insights from host-virus arms races evolutionary conflicts between viruses and restriction factors shape immunity a cross-species view on viruses positional cloning of the mouse retrovirus restriction gene fv paleovirology and virally derived immunity evolution of the retroviral restriction gene fv : inhibition of non-mlv retroviruses a study in wild mice showing that fv antiviral activity is broader than previously thought. it identifies positively selected residues in the c terminus that contribute to antiviral specificity evolution-guided identification of antiviral specificity determinants in the broadly acting interferon-induced innate immunity factor mxa a seminal paper that applies an evolution-guided approach to detect mx residues that confer antiviral specificity human mx is an interferon-induced post-entry inhibitor of hiv- infection evolutionary analysis identifies an mx haplotype associated with natural resistance to hiv- infection manipulation of costimulatory molecules by intracellular pathogens: veni, vidi, vici!! plos pathog mhc class i antigen presentation: learning from viral evasion strategies an evolutionary analysis of antigen processing and presentation across different timescales reveals pervasive selection a million year history of t cell regulatory molecules reveals widespread selection, with adaptive evolution of disease alleles the intertransmembrane region of kaposi's sarcoma-associated herpesvirus modulator of immune recognition contributes to b - downregulation the nef protein of hiv- induces loss of cell surface costimulatory molecules cd and cd in apcs members of adenovirus species b utilize cd and cd as cellular attachment receptors structural basis for langerin recognition of diverse pathogen and mammalian glycans through a single binding site hiding lipid presentation: viral interference with cd d-restricted invariant natural killer t (inkt) cell activation a threonine-based targeting signal in the human cd d cytoplasmic tail controls its functional expression evolutionary history of copynumber-variable locus for the low-affinity fcγ receptor: mutation rate, autoimmune disease, and the legacy of helminth infection one of the few studies of helminth-driven selective pressure in mammals that also integrates evolutionary analysis with epidemiological information thrombosis as an intravascular effector of innate immunity dual host-virus arms races shape an essential housekeeping protein an extremely interesting study extending the arms race scenario to a housekeeping protein, the transferrin receptor evolutionary reconstructions of the transferrin receptor of caniforms supports canine parvovirus being a re-emerged and not a novel pathogen in dogs escape from bacterial iron piracy through rapid evolution of transferrin mammalian npc genes may undergo positive selection and human polymorphisms associate with type diabetes niemann-pick c (npc )/ npc -like chimeras define sequences critical for npc 's function as a flovirus entry receptor cell entry by a novel european filovirus requires host endosomal cysteine proteases and niemann-pick c multiple cationic amphiphiles induce a niemann-pick c phenotype and inhibit ebola virus entry and infection inhibition of ebola virus infection: identification of niemann-pick c as the target by optimization of a chemical probe small molecule inhibitors reveal niemann-pick c is essential for ebola virus infection evolutionary analysis of the contact system indicates that kininogen evolved adaptively in mammals and in human populations positive selection during the evolution of the blood coagulation factors in the context of their disease-causing mutations induction of vascular leakage through release of bradykinin and a novel kinin by cysteine proteinases from staphylococcus aureus viral immune modulators perturb the human molecular network by common and unique strategies genome-wide rnai screen identifies human host factors crucial for influenza virus replication a novel test for selection on cis-regulatory elements reveals positive and negative selection acting on mammalian transcriptional enhancers paml : phylogenetic analysis by maximum likelihood accuracy and power of bayes prediction of amino acid sites under positive selection bayes empirical bayes inference of amino acid sites under positive selection detecting individual sites subject to episodic diversifying selection high sensitivity to aligner and high rate of false positives in the estimates of positive selection in the drosophila genomes class of multiple sequence alignment algorithm affects genomic analysis estimates of positive darwinian selection are inflated by errors in sequencing, annotation, and alignment the effects of alignment error and alignment filtering on the sitewise detection of positive selection effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level the effect of insertions, deletions, and alignment errors on the branch-site test of positive selection multiple hypothesis testing to detect lineages under positive selection that affects only a few sites a random effects branchsite model for detecting episodic diversifying selection modeling the site-specific variation of selection patterns along lineages performance of standard and stochastic branch-site models for detecting positive selection among coding sequences statistical properties of the branch-site test of positive selection towards a systems understanding of mhc class i and mhc class ii antigen presentation cd antigen presentation: how it works the authors declare no competing interests. key: cord- - tbc x authors: rustmeier, nils h.; strebl, michael; stehle, thilo title: the symmetry of viral sialic acid binding sites—implications for antiviral strategies date: - - journal: viruses doi: . /v sha: doc_id: cord_uid: tbc x virus infections are initiated by the attachment of the viral particle to protein or carbohydrate receptors on the host cell. sialic acid-bearing glycan structures are prominently displayed at the cell surface, and, consequently, these structures can function as receptors for a large number of diverse viruses. structural biology research has helped to establish the molecular bases for many virus–sialic acid interactions. due to the icosahedral point group symmetry that underlies many viral capsids, the receptor binding sites are frequently arranged in a highly symmetric fashion and linked by five-fold, three-fold, or two-fold rotation axes. for the inhibition of viral attachment, one emerging strategy is based on developing multivalent sialic acid-based inhibitors that can simultaneously engage several of these binding sites, thus binding viral capsids with high avidity. in this review, we will evaluate the structures of non-enveloped virus capsid proteins bound to sialylated glycan receptors and discuss the potential of these structures for the development of potent antiviral attachment inhibitors. the cell membranes of eukaryotes are decorated with a large number of chemically and structurally diverse carbohydrates. these so-called glycans form a protective layer at the interface between a cell and its environment. components of this layer are synthesized and assembled by a large set of enzymes that differ among species, thus helping to provide host-specific glycan structures. despite some differences, the carbohydrate building blocks of the glycan layer (monosaccharides) are largely identical in all cases. commonly found monosaccharides in glycan structures are glucose (glc), galactose (gal), their n-acetylated forms n-acetylglucosamine (glcnac) and n-acetylgalactosamine (galnac), and mannose (man). these building blocks constitute the bulk of most glycans. however, two particularly important sugar classes are missing here: fucoses and sialic acids. while glc(-nac) and gal(-nac) are usually components of the scaffold, fucose (fuc) and sialic acids are often added as head groups to a glycan. while fucose is, for example, an important determinant in histo-blood group antigens (hbga), sialic acids are major components of the glycan portions of gangliosides and the glycan structures of many membranous proteins in eukaryotes. sialic acids are based on a nine-carbon acidic α-keto sugar framework ( figure ). due to their anionic nature, sialic acids contribute to the negative net charge of the cell surface. sialic acids are abundant in the animal kingdom and are found in different forms. their most common form is n-acetylneuraminic acid (neu ac), which also constitutes the chemical basis for other sialic acids. the hydroxylation of the n-acetyl group of neu ac gives rise to n-glycolylneuraminic acid (neu gc), another commonly found sialic acid in animals. while neu gc cannot be synthesized in humans due to a gene defect, humans glycans terminating in sialic acid are prominently expressed at the cell surface. they are, therefore, often easily accessible and serve as the initial contact points for many viruses in different families [ , ] . while some of these glycans function as attachment receptors to simply tether a virus to the target cell membrane, others act as entry receptors and mediate binding of the virus, as well as the delivery of the viral genome into the cytoplasm. in the latter case, the attachment step is often followed by the recruitment of secondary (co-)receptors and endocytosis factors, eventually leading to cell entry of the virus particle or its components and infection of the cell. among the enveloped viruses that recognize sialic acid-containing receptors are members of the families coronaviridae, paramyxoviridae and orthomyxoviridae [ ] [ ] [ ] [ ] [ ] . in non-enveloped viruses, sialic acid-containing glycans serve as attachment receptors for members of the parvoviridae, picornaviridae, caliciviridae, polyomaviridae, reoviridiae, and adenoviridae [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . structural biology has provided precise views of how these pathogens interact with sialylated glycans, and although the binding modes differ among the viruses listed above, several common principles have emerged. (i) the viral binding sites for sialylated glycans are typically surface-exposed and feature a small number of contacts. the affinities of the interactions are, therefore, quite low (in the millimolar range) [ ] [ ] [ ] [ ] . firm adhesion of the virus to the cell surface is achieved through the engagement of multiple receptors via identical binding sites, which is known as avidity. (ii) in all cases investigated to date, the sialic acid itself mediates the majority of contacts with the viral capsid, with a smaller number of additional contacts formed to neighboring monosaccharides. (iii) most viruses are highly specific in the context in which sialic acid is presented; that is, they only recognize sialylated glycans featuring, for example, α- , -linked sialic acid but do not engage sialylated glycans carrying α- , -linked or α- , -linked sialic acid. (iv) although the database remains small, some viruses can discriminate between the many different modifications of sialic acids, and, as some of these modifications, are species-specific, this phenomenon can contribute to the ability of a virus to only infect species that express a particular sialic acid modification. the available structural information on virus-receptor interactions is crucial to enable the rational design of therapeutic compounds. due to the surface-exposed binding mode and the weak individual interactions between sialic acids and their cognate virus proteins, modifying sialic acid to achieve high-affinity binding is challenging. however, viruses possess many identical binding sites that are often linked by symmetry operators, and thus multivalent and symmetric ligands that target several binding sites could result in high-affinity interactions. the strategy of employing a carbohydrate-based, multivalent, and symmetric inhibitor that matches the symmetry of the binding sites in a multimeric target protein was first applied in the context of the bacterial shiga-like toxin (slt). slt consists of an enzymatic domain a and a pentameric, cell-binding domain b [ ] . the crystal structures revealed that the b domain pentamer recognizes the p k trisaccharide portion (αgal - βgal - βglc) of its physiologic ganglioside receptor, globotriaosylceramide (gb ) [ , ] . in order to achieve high affinity binding, kitov et al. [ ] designed the starfish compound, a quasi-symmetric, pentavalent molecule with a central glucose motive carrying five linkers that terminate in dimeric p k trisaccharides ( figure ). x-ray crystallography of the toxin-inhibitor complex revealed a sandwich-like arrangement of two slt b-pentamers intercepted by one starfish molecule. all five b-pentamer binding sites were simultaneously occupied by the inhibitor. in line with this, affinity measurements showed an increase in the inhibition potency from a millimolar affinity for the monovalent receptor (p k trisaccharide) to a subnanomolar affinity for the starfish compound. this concept of targeting multiple, symmetric receptor binding sites by multivalent inhibitors is also applicable for many viruses, since viral capsids are often icosahedral and, therefore, highly symmetric structures. figure . an example of a tailored multivalent inhibitor. the globotriaosylceramide-binding b-subunit of shiga-like toxin (slt) forms pentamers and serves as target for the pentavalent inhibitory compound starfish, which has been functionalized with the p k trisaccharide. the starfish compound exploits the symmetric structure of its target and binds to slt with a subnanomolar affinity [ ] . the slt pentamer is shown as a protein surface with single protomers colored in grey, yellow, pink, green, and light blue, respectively. the starfish compound is shown in stick representation with carbon, nitrogen and oxygen atoms colored in orange, dark blue and red, respectively. missing parts of the scaffold structure are schematically indicated as orange lines (pdb id qnu). all protein representations in the figures of this review were generated using pymol (schrödinger inc.). in this chapter, we will introduce some universal concepts of virus capsid geometry and architecture, focusing in particular on non-enveloped viruses that bind sialic acid-based receptors. we will highlight the local symmetries that relate the sialic acid binding sites in different viral attachment proteins to each other. these local symmetries can serve as a useful framework for the rational design of multivalent virus-targeting inhibitors, similar to the approach used to develop the starfish compound. this strategy has been successfully applied to several viruses, as we will show in chapter . however, in order to evaluate this approach, we first need to introduce the symmetry elements that guide viral capsid assembly. typically, small viral genomes can only encode a low number of structural capsid proteins. these capsid proteins often form multimers (capsomers), which, again, assemble into a stable virus capsid that can house the viral genome and associated components (e.g., nucleoproteins or enzymes). thus, the assembled virus particles comprise many copies of capsid proteins and display a high (quasi-) symmetry that is often based on an icosahedron [ ] . in icosahedral capsids, the particle architecture can be expressed in terms of the triangulation number. an icosahedron is a polyhedron consisting of identical triangular faces that intersect at twelve vertices with five-fold rotational symmetry. three-fold rotational axes are located in the center of each triangular face. the edges between the faces are intersected by two-fold rotational axes. the combination of these symmetry operations gives rise to the point group of an icosahedron ( figure , lower right). in the context of an icosahedral virus particle, the triangulation number (t = h + hk + k ) can be interpreted as a measure of capsid size. it is calculated by the numbers of inter-capsomeric steps (iterated in h and k) that one has to traverse via (quasi-) six-fold symmetric capsomers, from one five-fold vertex to another. five-fold vertices and the associated capsid proteins are highlighted in blue. a schematic view of an icosahedron in the same orientation is shown on the lower right, with two-fold, three-fold and five-fold axes indicated as ellipse, triangle and pentagon, respectively. pdb ids b t (adenovirus), cse (orthoreovirus), kz (rotavirus), sid (polyomavirus), and q w (coxsackievirus). the smallest and simplest virus capsids have an architecture corresponding to a triangulation number of t= . in a t= capsid, each of the twelve five-fold symmetric vertices are occupied by a single capsid protein pentamer. neighboring capsomers are pentamers that are also associated with the five-fold icosahedral vertices. no additional capsomers are present, which gives rise to a total number of × = capsid proteins in the particle. to our knowledge, the only example of a t= sialic acid binding capsid is found in the adeno-associated virus (aav) of the parvoviridae family. sialic acid binding in aav was reported to occur at the interface between two single capsid protein monomer chains, resulting in a total number of binding sites in the capsid. the symmetry of the local binding site is defined by the shortest distance of a single binding site towards the rotational axis. the aav capsid proteins that are responsible for sialic acid binding form pentamers. however, the binding sites themselves locate closer to the icosahedral three-fold symmetry axis than the five-fold capsomer/vertex axis. this implies that the sialic acid binding sites of the aav capsid are arranged in twenty local three-fold rather than twelve local five-fold symmetries [ ] . virus particles comprising more than twelve capsomers must, by definition, have a higher triangulation number than t= . picornaviridae family members (such as coxsackieviruses, rhinoviruses, polioviruses, or enteroviruses) are viruses possessing three surface-exposed capsid proteins. the icosahedral five-fold penton positions are occupied by vp pentamers, which are bridged by one pseudo-hexon capsomer (vp -vp heterohexamer), giving rise to pseudo-t= geometry ( figure ). in the sialic acid-binding human coxsackievirus, a variant (cva v), the binding site is located at the interface of two vp protomers, close to the five-fold icosahedral axis. vp and vp do not bind sialic acid, which results in a total of binding sites, arranged in twelve local five-fold symmetries (figure a ) [ ] . the distance between an individual sialic acid binding site and the local five-fold rotation axis of the vp pentamer amounts to ca. . nm. in the members of both papillomaviridae and polyomaviridae, single capsid proteins named l and vp , respectively, constitute the outer capsid. both proteins exclusively form pentamers, which can occupy pentavalent and hexavalent positions in the mature particle, thus deviating from the quasi-equivalence principle proposed by caspar and klug in [ ] . here, a total number of pentameric capsomers are arranged in a t= d fashion, with a diameter of ca. nm in the mature particles ( figure ) [ ] [ ] [ ] . the formation of smaller, non-viable lower-symmetry t= virus-like particles (vlps) has also been described for both papillomaviruses and polyomaviruses [ ] [ ] [ ] . many members of the polyomavirus family bind sialic acid-based glycans using their vp proteins, so the binding sites on individual pentamers are always linked by local five-fold symmetry (figure a , tspyv). in all structures of polyomavirus-sialyl-oligosaccharide complexes, the majority of contacts between the protein and receptor can be attributed to sialic acid, with a small number of augmenting contacts to other saccharides providing specificity for a given glycan structure [ , [ ] [ ] [ ] [ ] [ ] [ ] [ ] . in contrast, papillomaviruses do not bind sialylated receptors, but instead interact with glycosaminoglycans to adhere to cells [ ] [ ] [ ] . reoviridae are a large family including, among others, the genera of orthoreoviruses and rotaviruses. members of the reoviridae family are double-shelled particles, in which the inner core layer consists of two core proteins arranged in a t= * order and the outer capsid possesses t= geometry [ ] [ ] [ ] . the diameter of mature virions is about - nm (figure ). in the case of mammalian reoviruses, the icosahedral vertex positions are occupied by the trimeric attachment protein sigma , which markedly protrudes as a thin fiber from the virion surface. the sigma protein is about nm long and has a head-and-tail morphology, with a globular head domain and an elongated tail that has flexible regions and partially inserts into the virion [ ] . the location of sialic acid binding is type dependent. type reoviruses bind sialylated receptors in the protruding head domain while type reoviruses use a binding site in a region near the mid-point of the tail (figure b , reov t d and t l) [ , ] . both sites primarily engage sialic acid with a small number of contacts, and in both cases, three-fold rotational symmetry is found between the binding sites within a single sigma trimer. thus, sigma trimers give rise to sialic acid binding sites in orthoreoviruses. rotaviruses also have protruding domains that recognize carbohydrate receptors, which are also dimers of the virus protein (vp ) [ ] . while some of the more pathogenic human viruses bind hbgas, animal rotaviruses primarily engage sialic acid-based receptors [ ] [ ] [ ] . each rotavirus vp subunit carries a single sialic acid binding site, which is related to the two-fold rotational symmetry in the dimer (figure a , rrv) [ ] . in further contrast to orthoreoviruses, these fibers do not coincide with the five-fold vertices but rather associate at the margins of the five capsomers around the five-fold vertices, resulting in a total number of vp dimers and sialic acid binding sites [ ] . adenoviridae members are large viruses with a diameter of about nm and a t= icosahedral capsid ( figure ) [ , ] . they display trimeric fibrous attachment proteins, known as the fiber, at their icosahedral vertices. similar to the reovirus sigma , the adenovirus fiber has a head-and-tail morphology and features a globular head domain (the knob) that projects from the virus surface, and a tail (the shaft) that inserts into the virus particle [ ] . the sialic acid binding sites of the different structurally characterized adenovirus types are located in the fiber knob domain [ , ] . while these sites differ in location in different adenoviruses, they are all linked by three-fold symmetry and lie in close proximity to each other (figure a, hadv and hadv ) . although this review focuses on non-enveloped viruses, it should still be mentioned that some of the concepts described above also apply to enveloped viruses, such as coronaviruses, paramyxoviruses, and orthomyxoviruses. while global symmetry measures for these viruses are elusive (since their attachment proteins are membrane-bound, somewhat mobile, and do not follow easily-appreciated assembly rules), the existing local binding site symmetries within multimeric attachment proteins can still be exploited for rational multivalent inhibitor design. a prominent example is the influenza a virus hemagglutinin, which is a homotrimeric protein bearing three individual sialic acid binding sites [ ] . these binding sites are related by local three-fold rotational symmetry, with a distance of . nm from the three-fold axis (figure a, iav) . the sialic acid moieties of glycosylated proteins and/or glycolipids are required for the attachment, entry, and productive infection of many viruses. in the majority of cases, the viral binding sites are surface-exposed and engage terminal sialic acid residues with a range of hydrophilic and hydrophobic contacts. the available structural data for receptor-ligand interactions at an atomic resolution can inform the synthesis of high affinity inhibitors via the chemical modifications of sialic acids. a prominent example of this approach is the inhibition of influenza virus neuraminidases. these enzymes recognize and hydrolyze terminal sialic acids of cell surface glycans and are vital for the release of viral progeny [ , ] . an early non-selective prototype inhibitor of neuraminidases was the sialic acid derivate -deoxy- , -didehydro-n-acetylneuraminic acid (neu ac en, dana) [ , ] . dana can engage and block the binding sites of neuraminidase of both ortho-and paramyxoviruses, thereby suppressing the activity of these enzymes. in the early s, mark von itzstein and colleagues made substantial advances towards selective and highly affine influenza a virus inhibitors. they discovered that the derivatization of the -o-hydroxyl group of dana with a guanidine moiety drastically increases affinity to the influenza a virus neuraminidase, which resulted in the anti-influenza drug zanamivir [ , ] . subsequently, the structures of the planar sialic acid transition state during the neuraminidase reaction became available. the analogous compounds of sialic acid's transition state based on benzoic and shikimic acids, and subsequently the drug oseltamivir [ ] [ ] [ ] [ ] , benefit from the increased affinity to neuraminidase compared to the native neu ac structure [ , ] . unfortunately, the family of compounds related to zanamivir or oseltamivir do not act against viruses that do not possess neuraminidase activity but instead engage undistorted sialic acids via a hemagglutinin (e.g., many non-enveloped viruses discussed in chapter of this review). in cases where affinities between sialylated glycans and hemagglutinins have been measured, the interactions were shown to have dissociation constants in the millimolar range [ ] [ ] [ ] [ ] . the firm cell attachment of these viruses is usually driven by high avidity, relying on a large number of identical binding sites that can engage receptors. in terms of the inhibition of attachment, monovalent sialic acid analogues are generally poor choices, as they are not able to engage a single binding site in a viral attachment protein with high affinity. therefore, polyvalent sialic acid-based compounds can sometimes be more effective. in parallel to the development of monovalent sialic acid transition state analogues against the influenza virus neuraminidases, polymeric or nanoparticle based sialic acid conjugates were developed to also prohibit influenza virus infection by extensive binding to the viral hemagglutinin, thereby shielding the virus particle. the influenza virus hemagglutinin can be blocked by multivalent sialosides that vary in chemical composition, size, branching complexity, and ligand density [ ] [ ] [ ] [ ] [ ] . particularly the effects of the nature of the scaffold (or platform) and spatial sialic acid distribution in polyvalent viral attachment inhibitors are still being investigated [ ] . recently, excellent reviews covering the composition and biophysical properties of multivalent sialosides were published by bhatia et al. [ ] and lu et al. [ ] , so we will not discuss this topic further here. next to the design of monovalent inhibitors or highly complex polyvalent macromolecular structures, a third and rather minimalistic design strategy makes use of the internal symmetry of the addressed targets. as elucidated earlier in this review, many viral attachment proteins employ a local symmetry of their receptor binding sites that represent a convenient framework for the rational design of small, tailored inhibitors. in contrast to monovalent inhibitors, oligovalent compounds can benefit from cooperative binding or avidity. nevertheless, they are small enough for nephric clearance and display a higher bioavailability than most macromolecular polyvalent structures [ ] . the design of oligovalent inhibitors is usually based on available crystal structures, taking the symmetry and topology of the target into account. the accessibility of the individual ligand binding sites and their distances from an eventual symmetry axis should be considered. for binding sites in close proximity to the local symmetry axis, it can be feasible to start from a central molecule, which can be derivatized by ligand terminating linkers, resulting in an inhibitor with a radial structure. for binding sites that are far from the local symmetry axis or in a recessed part of the target protein, it may be viable to directly link ligand molecules to each other instead of using a central scaffold. direct linking of ligand molecules may eventually result in the design of a circular oligovalent inhibitor. so far, structural information on attachment proteins bound by oligovalent inhibitors is scarce. x-ray crystallography analysis is usually hampered by the high flexibility of spacer groups and scaffolds. these non-binding parts of the inhibitors typically do not assume defined conformations, resulting in weak and uninterpretable electron density. this was the case in the study of the shiga-like toxin inhibitor starfish, described in the introduction of this review, where interpretable electron density was only present for the bound ligand moieties of the compound. still, the structural characterization of virus-inhibitor interactions in order to design or optimize anti-viral compounds has been the subject of extensive research. recently, a research paper by lu et al. [ ] reported on the trivalent design of sialic acid bearing inhibitors against influenza a virus hemagglutinin, showing a greater than -fold increase in affinity compared to the monovalent ligand. however, structural data of the interactions between the inhibitor and hemagglutinin were not reported in that study. one study of a central group-based anti-viral inhibitor, which contains structural data, can be found for adenoviruses. adenoviridae members carry trimeric fibers terminating in the knob domain, which engages sialic acid-based receptors in some adenoviruses. human adenovirus (hadv ), which causes epidemic keratoconjunctivitis (ekc), carries three individual binding sites for sialic acids in its fiber knob, which are located at a distance of Å from the local three-fold symmetric axis (figure a ) [ ] . the physiologic receptor of hadv is the glycan portion of ganglioside gd a. this glycan is a branched hexasaccharide with two terminal sialic acid moieties, both of which were shown to simultaneously occupy two of the three available sialic acid binding sites in the same fiber knob [ ] . based on this observation, spjut et al. ( ) [ ] designed and synthesized a symmetric, tridentate sialylated inhibitor, which is capable of occupying all three binding sites of the fiber knob at once. the first generation of these inhibitors was designed around a central tris( -aminoethyl) amine group. it utilized flexible spacers between the central group and the sialic acid ligand, in order to minimize the chance of steric hindrance of the inhibitor docking. the resulting compounds demonstrated a potency increase of four orders of magnitude compared to monomeric sialic acid [ ] . recently, in the second-generation inhibitors, the binding affinity could be improved even further by using a shorter, triazole-based linker structure, which was based on the results of the first-generation complex structures. the more compact and rigid design resulted in an additional -fold increase in potency [ ] . the crystal structures of the second-generation inhibitor molecules in a complex with adenovirus fiber knob proteins verify the trivalent binding mode by also displaying the electron density of the linkers (figure a,b, left) . an example of directly-linked ligand moieties was shown in a study of potential polyomavirus inhibitors, in which baier et al. [ ] synthesized so-called divalent sialylated glycooligopeptides. they solved the structures of two glycooligopeptide compounds in a complex with the major capsid protein (vp ) pentamer of the trichodysplasia spinulosa-associated polyomavirus (tspyv), which is associated with abnormal skin growth in immunocompromised patients. the vp pentamer carries five individual sialic acid binding sites at a distance of Å between neighboring sites [ , ] . the glycooligopeptide-vp complex structures displayed a similar ligand binding mode that was reported for sialic acid in an earlier study [ ] and showed, for the compounds, that the linker between the ligand and the scaffold occupies the space that is usually targeted by the natural glycan receptor moieties (figure a,b, right) . however, the interconnectivity of functional receptors by the scaffold remained undetermined (probably due to their flexibility) and are, therefore, the average of several potential bridging modes on top of the pentamers [ ] . many viral lectins or attachment proteins rely on the recognition of sialic acids. due to the high symmetry of viral particles and the occurrence of local symmetry within commonly multimeric viral proteins, sialic acid binding often occurs in a symmetrical context, too. this symmetry is a convenient framework for the design of tailor-made inhibitory ligands competing with the high avidity of virus-cell interactions. structural biology techniques, such as x-ray crystallography and single-particle electron cryo-microscopy (cryo-em), can now tackle the visualization of viral attachment and carbohydrate interactions with unprecedented scope and detail. the resulting structural data can be used for the optimization of anti-viral compounds, which could be developed further into high-affinity drug candidates. however, challenges in compound design remain. for example, the higher rigidity of a multivalent ligand does not necessarily translate into improved binding. in the case of had , a rigid compound bound -fold less well to the fiber knob than a related compound that had higher flexibility [ ] . this suggests that the perfect positioning of all sialic acids in the binding site, especially for larger inhibitor molecules, is difficult to achieve, and a certain degree of flexibility might help with the high-affinity binding of the inhibitor. another limiting factor for oligovalent inhibitors is the positioning of the binding pockets. in the case of the trivalent adenovirus inhibitor, the binding sites are on the very top of the knob domain, so there is enough space for linkers and a central core (figure a ). in contrast, the sialic acid binding sites of the reovirus sigma fibers are located on the side of the protein (figure b) , which makes it challenging to design an appropriate oligovalent inhibitor. additional challenges are the long-term stability, convenient synthesis, and, for later application, reasonable bioavailability of multivalent compounds. n-glycolylneuraminic acid deficiency in humans chemical diversity in the sialic acids and related α-keto acids: an evolutionary perspective diversity in cell surface sialic acid presentations: implications for biology and disease essentials of glycobiology sialic acid tissue distribution and influenza virus tropism structural features of glycan recognition among viral pathogens glycan engagement by viruses: receptor switches and specificity human and bovine coronaviruses recognize sialic acid-containing receptors similar to those of influenza c viruses sialic acids as receptor determinants for coronaviruses role of sialic acid-containing molecules in paramyxovirus entry into the host cell: a minireview structure of the influenza virus haemagglutinin complexed with its receptor, sialic acid sialic acid species as a determinant of the host range of influenza a viruses binding of adeno-associated virus type to , -linked sialic acid is required for gene transfer adeno-associated virus serotype (aav ) and aav both require sialic acid binding for hemagglutination and efficient transduction but differ in sialic acid linkage specificity sialic acid functions in enterovirus binding and infection coxsackievirus a variant uses sialic acid-containing o-linked glycoconjugates as cellular receptors on human ocular cells sialic acid-dependent cell entry of human enterovirus d tulane virus recognizes sialic acids as cellular receptors infection of glial cells by the human polyomavirus jc is mediated by an n-linked glycoprotein containing terminal α( - )-linked sialic acids gangliosides are receptors for murine polyoma virus and sv identification of gangliosides gd b and gt b as receptors for bk virus ganglioside gt b is a putative host cell receptor for the merkel cell polyomavirus the α-anomeric form of sialic acid is the minimal receptor determinant recognized by reovirus initial interaction of rotavirus strains with n-acetylneuraminic (sialic) acid residues on the cell surface correlates with vp genotype, not species of origin adenovirus type uses sialic acid as a cellular receptor human adenovirus uses sialic acid-containing glycoproteins and the coxsackie and adenovirus receptor for binding to target cells structural basis of gm ganglioside recognition by simian virus structural and functional analysis of murine polyomavirus capsid proteins establish the determinants of ligand recognition and pathogenicity triazole linker-based trivalent sialic acid inhibitors of adenovirus type infection of human corneal epithelial cells hemagglutinins from two influenza virus variants bind to sialic acid derivatives with millimolar dissociation constants: a -mhz proton nuclear magnetic resonance study globotriosyl ceramide is specifically recognized by the escherichia coli verocytotoxin structure of the shiga-like toxin i b-pentamer complexed with an analogue of its receptor gb shiga-like toxins are neutralized by tailored multivalent carbohydrate ligands physical principles in the construction of regular viruses characterization of the adeno-associated virus and sialic acid binding site a sialic acid binding site in a human picornavirus atomic model of the papillomavirus capsid polyoma virus capsid structure at . . a resolution structure of simian virus at . -Å resolution structure of small virus-like particles assembled from the l protein of human papillomavirus polymorphism in the assembly of polyomavirus capsid protein vp structure and assembly of a t = virus-like particle in bk polyomavirus high-resolution structure of a polyomavirus vp -oligosaccharide complex: implications for assembly and receptor binding structure-function analysis of the human jc polyomavirus establishes the lstc pentasaccharide as a functional receptor motif structures of merkel cell polyomavirus vp complexes define a sialic acid binding site required for infection a structure-guided mutation in the major capsid protein retargets bk polyomavirus structures of b-lymphotropic polyomavirus vp in complex with oligosaccharide ligands crystallographic and glycan microarray analysis of human polyomavirus vp identifies n-glycolyl neuraminic acid as a receptor candidate trichodysplasia spinulosa-associated polyomavirus uses a displaced binding site on vp to engage sialylated glycolipids the l major capsid protein of human papillomavirus type recombinant virus-like particles interacts with heparin and cell-surface glycosaminoglycans on human keratinocytes human papillomavirus infection requires cell surface heparan sulfate human papillomavirus types , , and share similar endocytic requirements for entry the three-dimensional structure of reovirus obtained by cryo-electron microscopy structure of the reovirus core at . ? Å resolution x-ray crystal structure of the rotavirus inner capsid particle at . a resolution sigma protein of mammalian reoviruses extends from the surfaces of viral particles the gm glycan serves as a functional coreceptor for serotype reovirus crystal structure of reovirus attachment protein sigma in complex with sialylated oligosaccharides three-dimensional visualization of the rotavirus hemagglutinin structure comparison of human, simian, and bovine rotaviruses for requirement of sialic acid in hemagglutination and cell adsorption role of sialic acids in rotavirus infection sialic acid dependence in rotavirus host cell invasion the rhesus rotavirus vp sialic acid binding domain has a galectin fold with a novel carbohydrate binding site atomic model of an infectious rotavirus particle the structure of the adenovirus capsid image reconstruction reveals the complex molecular organization of adenovirus crystal structure of species d adenovirus fiber knobs and their sialic acid binding sites structural basis of preexisting immunity to the h n pandemic influenza virus functional significance of sialidase during influenza virus multiplication functional significance of sialidase during influenza virus multiplication: an electron microscope study inhibition of neuraminidase activity by derivatives of -deoxy- , -dehydro-n-acetylneuraminic acid inhibition of influenza and parainfluenza virus replication in tissue culture by -deoxy- , -dehydro-n-trifluoroacetylneuraminic acid (fana) rational design of potent sialidase-based inhibitors of influenza virus replication -guanidino- , -dideoxy- , -dehydro-n-acetylneuraminic acid is a highly effective inhibitor both of the sialidase (neuraminidase) and of growth of a wide range of influenza a and b viruses in vitro structure-based inhibitors of influenza virus sialidase. a benzoic acid lead with novel interaction synthesis and influenza neuraminidase inhibitory activity of aromatic analogs of sialic-acid influenza neuraminidase inhibitors possessing a novel hydrophobic interaction in the enzyme active site: design, synthesis, and structural analysis of carbocyclic sialic acid analogues with potent anti-influenza activity synthesis of a carbocyclic sialic acid analogue for the inhibition of influenza virus neuraminidase evidence for a sialosyl cation transition-state complex in the reaction of sialidase from influenza virus a study of the active site of influenza virus sialidase: an approach to the rational design of novel anti-influenza drugs polyacrylamides bearing pendant α-sialoside groups strongly inhibit agglutination of erythrocytes by influenza-virus effective inhibitors of hemagglutination by influenza virus synthesized from polymers having active ester groups. insight into mechanism of inhibition generation and in situ evaluation of libraries of poly(acrylic acid) presenting sialosides as side chains as polyvalent inhibitors of influenza-mediated hemagglutination inhibition of viral adhesion and infection by sialic-acid-conjugated dendritic polymers polymeric inhibitor of influenza virus attachment protects mice from experimental influenza infection linear polysialoside outperforms dendritic analogs for inhibition of influenza virus infection in vitro and in vivo pathogen inhibition by multivalent ligand architectures carbohydrate-protein interactions and multivalency: implications for the inhibition of influenza a virus infections properties of the glomerular barrier and mechanisms of proteinuria enhanced inhibition of influenza a virus adhesion by di-and trivalent hemagglutinin inhibitors the gd a glycan is a cellular receptor for adenoviruses causing epidemic keratoconjunctivitis a potent trivalent sialic acid inhibitor of adenovirus type infection of human corneal cells divalent sialylated precision glycooligomers binding to polyomaviruses and the effect of different linkers this article is an open access article distributed under the terms and conditions of the creative commons attribution (cc by) license we thank the swiss institute of bioinformatics for the provision of the expasy viralzone and the scripps institute for the provision of the viper data bank-two very helpful resources of virus knowledge. we apologize to our many colleagues whose work could not be discussed and cited here due to space considerations. the authors declare no conflict of interest. key: cord- - u njtt authors: scott, shannon d; osmond, martin h; o'leary, kathy a; graham, ian d; grimshaw, jeremy; klassen, terry title: barriers and supports to implementation of mdi/spacer use in nine canadian pediatric emergency departments: a qualitative study date: - - journal: implement sci doi: . / - - - sha: doc_id: cord_uid: u njtt background: despite recent research supporting the use of metered dose inhalers with spacer devices (mdi/spacers) in pediatric emergency departments (peds) for acute exacerbations of asthma, uptake of this practice has been slow. the objectives of this study were to determine the barriers and supports to implementing mdi/spacer research and to identify factors associated with early and late adoption of mdi/spacers in canadian peds. methods: using a comparative case study design, we classified nine tertiary care pediatric hospital peds based on their stage of implementation. data were collected using focus group interviews with physicians, registered nurses (rns), and respiratory therapists (rts), and individual interviews with both patient care and medical directors at each site. initial coding was based on the ottawa model of research use (omru) categories of elements known to influence the uptake of innovations. results: one hundred and fifty healthcare professionals from nine different healthcare institutions participated in this study. lack of leadership in the form of a research champion, a lack of consensus about the benefits of mdi/spacers among staff, perceived resistance from patients/parents, and perceived increased cost and workload associated with mdi/spacer use were the most prevalent barriers to the adoption of the mdi/spacer. common strategies used by early-adopting sites included the active participation of all professional groups in the adoption process in addition to a well-planned and executed educational component for staff, patients, and families. early adopter sites were also more likely to have the mdi/spacer included in a clinical protocol/pathway. conclusion: potential barriers and supports to implementation have been identified that will help eds adopt mdi/spacer use. future interventions intended to increase mdi/spacer use in peds will need to be sensitive to the barriers identified in this study. an acute asthma exacerbation is one of the most common reasons for children to present to an emergency department (ed). conventional treatment focuses on the delivery of beta- -agonists (bronchodilators) to relieve the bronchospasm. there are two main methods of delivering bronchodilators to children in the ed: nebulisation and metered-dose inhaler with spacer (mdi/spacer). for over a decade, the evidence has been well established that mdi/spacers are as effective (and in many ways superior) to nebulisers for mild to moderate asthma treatment in the ed [ ] . despite the research evidence, the uptake of mdi/spacers in pediatric emergency departments (peds) has been slow both in canada and the united states [ ] . a small number of international investigations that center on the successes in implementing mdi/spacers in peds [ , ] have demonstrated no changes in admission rates to the ward and intensive care unit (icu), and have found that parent/ child satisfaction is improved [ ] . in addition, when asked, children and parents stated that they preferred using mdi/spacers to nebulisers. in canada there is a unique 'natural experiment' opportunity to study the adoption of mdi/spacer, because various peds across the country are at different stages of adopting this method of treatment. focusing on the knowledge, attitudes, and practices of healthcare professionals regarding mdi/spacer use, osmond and colleagues [ ] conducted a survey in peds across canada and found that while most physicians and nurses believed that the evidence supported the use of mdi/spacers, few actually used this method of treatment in their personal practice. hurley and colleagues [ ] investigated the reasons for this discrepancy by comparing interview data with staff from a site that had adopted mdi/spacer use, to one that had not. with our study, we intend to further explore the issues raised in the study by osmond and colleagues [ ] (with more sites) and to explore the transferability of the findings from the hurley study across early adopters, adopting sites, and sites yet to adopt mdi/spacers. the overall objectives of this study were to determine the barriers and supports to implementing mdi/spacers into ped practice, and identify factors associated with early and late adoption of mdi/spacers in peds in canada. a comparative case study design [ ] was used within the eds of nine canadian pediatric tertiary-care teaching hospitals. case studies are appropriate when the boundaries between the phenomenon of interest and the context in which it occurs are not clear [ ] . for this study, the phe-nomenon of interest was the adoption of mdi/spacers for the treatment of asthma, and while data collection occurred at the level of the individual practitioners, the unit of analysis was the individual peds. all of the hospitals belong to pediatric emergency research canada (perc), a collaborative, nationwide pediatric emergency medicine research network. the eds were classified into one of three categories based on stage of mdi/spacer implementation: 'early adopters,' 'adopting,' and those 'yet to adopt' [ ] . level of adoption was based on results from osmond et al.'s study [ ] in which members of ten canadian peds were surveyed regarding mdi/spacer use, and was verified by a key informant. in the study by osmond et al., sites were categorized based on how individual emergency physicians at each site responded to a specific scenario given to them as part of the survey. for the purposes of the present study, 'early adopters' were those sites in which mdi/spacer had already been incorporated into routine practice for the treatment of mild to moderate asthma. 'adopting' sites were those that were actively involved in switching from nebulised to mdi/ spacer treatment, and 'yet to adopt' sites were those sites that exclusively used nebulised treatments and had not started a process to adopt mdi/spacers. we purposively sampled [ ] from among the ten eligible peds so that there was equal representation from each of the three categories (the single site that was not chosen from the study by osmond et al . had yet to adopt mdi/spacer use). eligible participants included all ped physicians, ed registered nurses (rns), respiratory therapists (rt) (in those eds where rts administered asthma treatment), as well as the medical and nursing directors in each of the departments. the health research ethics board of the university of alberta as well as the ethics review boards of the nine participating institutions approved the study. an information letter outlining the details of the study, was sent to the perc-affiliated research nurse or physician in each site, who in turn shared the information with all ed staff using methods of communication that were typical for their site (e.g., email messages, posters, and memos). informants self-selected to participate in the study. between march and march , a masters-prepared nurse (ko) with qualitative research experience and with no previous connection to any of the interviewees, collected data using both focus group interviews (n = ) and individual interviews (n = ). interview participants were assured that their responses would remain confidential and anonymous, and that responses would be linked to 'categories' only, not individual settings. two focus groups were conducted at each site: one with physicians, and one with rns and rts. individual interviews were conducted with the medical and patient care directors at each site, either in person, or later by telephone. the semistructured [ ] interviews, lasting between and minutes, centred on the perceived barriers and facilitators to mdi/spacer use. separate interview schedules [see additional file ] were developed for each level of adoption (early adopters, adopting, yet to adopt). probing questions were used to help illuminate statements given by participants that were incomplete, vague, or ambiguous [ ] . a court reporter, an individual trained in the verbatim recording of the spoken word, was used to record the focus group interviews, and both electronic and print copies of the transcribed interviews were produced [ ] . a digital recording device was used to record the individual interviews, which were later transcribed. all interview transcripts were checked for accuracy and completeness using line-by-line comparison between the audio recordings and written transcripts. at each site, interviews were conducted until data saturation occurred, that is, until no new themes emerged from the data [ ] . two investigators (ss and ko) analyzed the data using a constant comparative [ , ] approach. data collection and analysis proceeded concurrently. data were managed using nud*ist software (version n , qualitative solutions and research). categories from the ottawa model of research use [ , ] (omru) were used to guide the development of the interview questions and topics covered, as well as the initial coding of the data and the organization of the emergent barriers and facilitators. data were used to make cross-case (i.e., pediatric ed to pediatric ed) and cross-category (i.e., early adopter and yet to adopt) comparisons. narratives relating to several major themes were developed. trustworthiness of our research data and analysis was guided by guba and lincoln's [ ] criteria of credibility, confirmability, dependability, and transferability. we operationalized these criteria through a series of activities. first, we broadly sampled focus group participants to allow for multiple and diverse perspectives, as well as to ensure that we did not have an over-representation of data from particular professional groups (credibility criterion). through the duration of the study, a comprehensive audit trail that documented all methodological decisions, conclusions, interpretations, and recommendations arising from the data was completed (confirmability). furthermore, a complete inventory of all data collection and analysis products that includes written up detailed field notes, and theoretical and analytical memos that document developing thoughts about the data was also logged (dependability criterion). finally, transferability was addressed by providing thick description of the eds studied (while maintaining anonymity) with sufficient detail and precision to allow the reader to make judgements about applicability [ ] to their respective settings. there were nine tertiary level pediatric eds comprising participants. three sites were already routinely using mdi/spacers in the ed ('early adopters'), two sites were in the adoption process, and the remaining four sites had not yet adopted the innovation. category membership was based on results from the study by osmond et al [ ] and was verified by self report from a key informant. table outlines the number of participants by profession and category of mdi/spacer adoption, as well as the numbers of interviews in each category. our findings are organized into thematic categories based on the omru [ ] elements thought to influence research use: the evidence-based innovation (mdi spacers), the potential adopter, and the practice environment. a list of representative quotations from each thematic category has been provided [see additional file ]. through exploring the dissemination processes of the mdi/spacer, we acquired important information on several perceived features of the mdi/spacer that either hindered or supported its adoption by the participating peds including: cost, effectiveness, infection control, and impact on the patient. the site-specific barriers and facilitators identified by participants are summarized in table . the prevailing perception was that it was more expensive to deliver beta- -agonists using a mdi/spacer than using a nebuliser. in fact, cost was the most significant factor perceived to shape the adoption process. participants cited the extra time it would take to administer the medication and the cost of sterilization as factors contributing to the cost. while in several sites, sterilizing and reusing the spacers (up to five times) was seen as a way to reduce costs, in others, spacers were considered single use only and were either sold to patients or given away. of note, participants in only one (early adopter) site thought that both treatment modalities took approximately the same amount of time, citing that the shorter time required for preparation and administration of the mdi/spacer made up for the time spent doing patient education. regardless of who bears the cost of the intervention, all sites recognize the importance of resolving budgetary concerns prior to initiating adoption plans. despite being aware of the research evidence, many of the participants from the yet to adopt sites were sceptical that mdi/spacer was as effective as nebulisers. most participants from early adopting sites and those sites engaged in the adoption process felt that using mdi/spacers were 'as good as if not better' than using nebulisation. those who had already adopted mdi/spacers added that their personal experience validated findings from the research literature. proper technique in administering the mdi/ spacer was important to the participants in ensuring its effectiveness. many participants in the 'early adopter' and 'adopting' sites felt that improper use of the mdi/spacer by patients contributed to the view that it was less effective in treating acute asthma than the nebuliser. regardless of their stage of adoption, most sites recognised that the use of the mdi/spacer was superior to the nebuliser in terms of infection control. the spread of disease via nebulisation was seen as a threat to patients and staff, but one that could be reduced significantly by the use of the mdi/spacer. the sars (severe acute respiratory syndrome) outbreak that occurred in central canada in was mentioned as a contributing factor to the adoption of mdi/spacer use in at least two sites. the mdi/spacer was generally seen to be a less intrusive treatment modality, and therefore less frightening to young children. specifically, it is the mist, noise, and confining nature of the nebuliser treatment that make it unpleasant for this age group. because mdi/spacer treatments take less time to administer (from the patients perspective), and parents are often asked to actively participate in the treatment, parents perceive it to be a less frightening treatment. the clinicians from the sites participating in this study exhibited a range of knowledge, attitudes, and skills regarding the mdi/spacer. the majority of participants were aware of the results of research concerning the effectiveness of mdi/spacers, with physicians and respiratory therapists in general being more knowledgeable than nurses about specific studies. even in adopting sites, there were individuals who had believed that nebulisation was more effective, especially for those patients who were in the moderate to severe range. the perception is that the majority of individuals, specifically physicians, needed to have 'bought-in' to the idea of using mdi/spacers for the adoption to take place. in order for this to occur, individuals needed to believe that the relative advantage gained by changing practice was 'worth' the energy and resources needed to make the change. interestingly, buy-in on an individual level was often present at non-adopting sites, but because of other barriers and/or a lack of consensus, adoption had not occurred. our findings on the characteristics of the adopters reflect the complexity of the mdi/ spacer adoption process. while individual clinicians may be aware of the advantages of mdi/spacer use, the actually 'adoption' of mdi/spacers is actually an institutional or department decision because it requires support from different disciplines, as well as the allocation of different resources and products. participants expressed positive attitudes towards research in general regardless of their stage of adoption of mdi/ spacers, yet some participants at early adopter and adopting sites were not entirely convinced by the literature supporting mdi/spacer use until they witnessed the outcomes for themselves. a number of participants at non-adopting sites expressed that research-based practice change that was done too quickly, and without adequate reflection, could lead to errors being made. in some sites, mdi/spacer administration and/or patient education was performed by respiratory therapists, resulting in individual nurses being less comfortable performing these tasks. respiratory therapists were considered a valuable resource in the five locations where they were permanent members of the ed team. only two of the sites that either had adopted or were adopting mdi/spacer use did so without any rt support. because of the high number of patients presenting with respiratory issues, having rts assigned to these patients allowed nurses to focus their attention on other patients. teaching and follow-up with patients were valued rt roles. at the sites where asthma treatment was not the sole responsibility of rts, the nurses were more comfortable using mdi/spacers. while characteristics of the practice environment had the greatest potential to highlight existing differences between the nine sites, similarities among the sites were also found. the existence of a large number of part-time staff in the ed was mentioned as a potential barrier to maintaining consistent practices because it was difficult to communicate policy and practice changes with staff whose presence in the ed was infrequent. many physicians who worked on a part-time basis were family physicians or pediatricians without specialty training in emergency medicine, and were perceived as being less up-to-date on the latest research in that area. the part-time nature of their positions within the ed made remaining current on departmental policies and procedures difficult. often, they relied on advice or information from full-time staff to keep current. interestingly, clinical practice variation was tolerated to different degrees among the nine sites. high staff turnover in the peds resulted in difficulty in following up on practice change initiatives. resources, such as more frequent staff inservices, may be needed to ensure that new staff are aware of practice changes. staff shortages and the 'downloading' of tasks that were previously performed in inpatient units (due to bed shortages) contributed to a busy environment in the ed where there were many competing priorities. noteworthy is that sites where the mdi/spacer was part of a written guideline, the practice was used more often and more consistently, with the exception of the one site where physicians were able to choose between nebulisers and mdi/spacers on the asthma guideline (regardless of severity). interestingly, mdi/spacers in this site were never used. organizational bureaucracy exerted an influence in the majority of sites and was seen as a significant barrier (in terms of magnitude) to practice change. participants expressed frustration at the amount of time and energy that was required to make modifications in clinical practice (policy changes). in order to conserve time, energy, and resources, staff recognized that only issues of the highest priority could be pursued and consequently, in some sites, it was recognized that adoption of mdi/spacers was not an urgent enough priority. all sites characterized collegial relationships as cooperative and based on mutual respect. furthermore, participants perceived relationships between different professional groups in the ed as more egalitarian than in other departments within the institution due to the way that work was organized in the ed. at all sites, nurses had the autonomy to assess patients and begin initial treatment. some sites, mainly early adopters, espoused a strong desire to be evidence-based and valued their reputation as a group that used 'cutting edge' practices. other sites, while they acknowledge the value of being aware of the most current practice trends, were more 'cautious' in their adoption of new practices, preferring to see 'how things worked out' in other sites before attempting the practice change themselves. the willingness to 'take risks' was a characteristic of at least one of the early adopting sites. participants in adopter sites perceived that the presence of a professional (or professionals) within the department willing to champion the practice change was one of the most significant factors influencing in the adoption of the mdi/spacer. most often, these individuals were respected clinicians by virtue of their experience and/or clinical expertise, had an interest in the specific clinical area (asthma), or had used mdi/spacers previously at another site. in two of the three early adopting sites, leadership at the clinical level was instrumental in achieving 'buy-in' by staff, and subsequently, adoption. buy-in was achieved by providing both credible evidence and persuasive arguments for adoption of the practice. in sites that had not yet adopted mdi/spacer use, clinicians willing to champion the mdi/spacer research were recognized as being important to the change process, but either no one had committed to the role, someone had in the past but had not been successful, or someone was in the process of garnering support for the practice change, but had not yet achieved their goal. the consensus among participants was that a physician should be in the championing role, and it should be a person who has considerable credibility as a clinician. all of the sites expected parents to initially be resistant to the use of mdi/spacers in the ed. participants from the early adopting sites acknowledged that parents were sceptical about the use of a treatment they had already used at home and had presumably 'not worked'. in particular, parents and children who were frequent ed visitors were perceived to be the most fervent in their resistance. parents also associated the use of the nebuliser with the administration of oxygen, and consequently doubted that any treatment without oxygen would be helpful. participants in early adopter sites said that while it took extra time to educate and reassure parents of the effectiveness of the new treatment, most were eventually convinced when they understood why it was being used, and witnessed the outcomes for themselves. in sites where adoption had not taken place, the resistance expected to come from parents was looked at more negatively and more challenging to overcome. nurses saw themselves as having to 'take the brunt' of the complaints by parents, predicting that they would be put in the difficult position advocating to back the 'old' delivery system on behalf of the family. despite the broadly expressed concern that parents would 'over treat' their children at home having seen a greater number of puffs being administered in the ed, there was no evidence, experiential or otherwise, to support that concern, and some participants noted that actually the opposite situation occurred occasionally. early adopter/adopting sites perceived an increase in patient empowerment and confidence, and a decrease in parental anxiety after having been shown how to manage their child's asthma effectively with the mdi/ spacer. the severity of a patient's asthma had a bearing on which method of treatment was used with 'mild' and 'moderate' patients receiving medication via mdi/spacer, and 'severe' patients receiving nebulisation. of note, in some sites, physicians believed that there was a tendency towards overestimating the severity of the patient's condition at triage, resulting in a higher than necessary use of nebulisation. regardless of their stage in the adoption process (early adopter, adopter or non-adopters/yet to adopt), many sites identified the same barriers and supports to mdi/ spacer adoption. the difference between early and late adopting sites was that early adopter sites dedicated resources to overcome adoption barriers. in order for resources to be directed towards the goal of adoption, consensus had to be reached among a majority of the medical staff. most often, this process was lead by one or more individuals who championed the cause within the department, presenting the research evidence to the other staff members. both a lack of consensus among staff and the lack of a research champion were the biggest barriers to adoption in late adopting sites. in all late adopting sites, there were significant numbers of staff, both physicians and nurses, who saw no clear advantage to adopting mdi/spacers. parental resistance was broadly cited as a potential barrier to adoption, however, in early adopting sites, steps were taken to ensure that this barrier was adequately addressed. staff attended organized educational sessions, educational materials were prepared for patients and their families, and in some cases, a campaign was launched to notify the broader community of the practice change. staff recognized that in order to convince parents of the effectiveness of the new practice, they themselves had to be confident in the practice change. the perception that administering medication via mdi/ spacers took more time and effort was prevalent among all three groups. in early adopting sites, nurses were willing to invest time to educate patients about mdi/spacer use as it would 'pay off' later with better asthma control and selfadministration of medication in the ed. staff at one early adopter site were convinced that mdi/spacer administration actually took less time than nebulisation. the final major barrier to mdi/spacer adoption was cost. because nebulisation would still be used for the more severe cases, some argued that 'doubling up' on equipment would be wasteful. also, some participants believed that there were other priorities within the department that were more deserving of a portion of the limited available resources. sites that had successfully adopted the mdi/spacers had some common strategies such as participation of all professional groups and having a well-planned and executed educational component. furthermore, having continuity of practice within the facility (i.e., same protocol in ed and in-patient units), between facilities or between the facility and community physicians made the process easier because patients received consistent information that was reinforced. as well, including mdi/spacer use as part of a guideline or protocol also facilitated successful adoption in these sites. essentially, the sites that had successful adopted the mdi/spacer were able to recognize the unique characteristics of the innovation, adopters, and practice environment at their particular ed and successfully use these attributes to facilitate adoption. the findings from our case study suggest how barriers and facilitators interact with each other in complex ways ( table ) to produce different outcomes in each of the study eds. in fact, there was not a clear pattern with respect to the barriers and facilitators in terms of stage of mdi/spacer adoption by eds. noteworthy is that the early adopter sites reported having many of the same barriers that sites that had yet to adopt the innovation had, however, early adopter sites were motivated to find ways to overcome these barriers. the identified barriers and facilitators related to: attributes of the innovation (mdi/ spacer), such as perceived ease of use, clear advantages of mdi/spacers and cost; attributes of the practice environment including both structural (e.g., staffing issues, organizational bureaucracy) and social factors (e.g., presence of a research champion, autonomy); and attributes of the individual clinicians working within the eds including elements such as entrenched ideas and scepticism. this complex array of factors at multiple levels shaping the adoption process mirrors the findings of denis and colleagues [ ] . they suggested that adoption processes followed different paths and factors that facilitated adoption at one site may hinder adoption in another site given the complex interplay of individual, contextual, and historical factors. furthermore, our findings suggest that the adoption of mdi/spacers is not easily reduced to the decision by an individual ed physician. rather, the decision to adopt mdi spacers requires support and decisions at levels ranging from the individual practitioner, parents, department, institution, and regional levels. our findings illustrate that individual clinicians cannot simply decide to change their clinical practice decision making and prescribe the use of mdi/spacers. our findings offer that this is an ed decision and the 'adopter' per se is the ed, not the individual clinician. the decision is not straightforward and does not involve exclusively practitioners becoming convinced only of the strength and rigour of the scientific evidence promoting the efficacy of this innovation, but rather, the organizational decision to adopt mdi/spacers is a complex decision that requires savvy and persuasion at multiple levels (clinician, department decision-makers) and the allocation of significant resources (human, equipment, and financial) in order to facilitate success. the complexity of this organizational decision, by default, suggests that planned change strategies are required to facilitate the adoption of mdi/spacers. our findings suggest that passive techniques where there are no champions in place and the strength of the research evidence is the sole motivator for adoption are not effective. in essence, knowledge is not enough to facilitate adoption. in particular, clarity over who is going to bear the cost of the mdi/spacer and harnessing the support of leadership and a champion were critical forces in garnering adoption in our study. although the decision to 'adopt' mdi/spacers is a multilevel institutional or department decision, foundational to successful adoption is individual clinician trust in the research that outlines the benefits of mdi/spacer use. the findings from this study complement the work of osmond and colleagues [ ] who surveyed canadian multidisciplinary ed healthcare professionals regarding their practices, beliefs and barrier to metered-dose inhaler/spacer use. osmond's findings highlighted that professionals from sites that used and did not use the mdi/spacer had positive beliefs and knowledge about the innovation. furthermore, physicians from both adopting and non-adopting sites expected equal or enhanced clinical outcomes with mdi/spacer use; however % of physicians did not use mdi/spacers for treatment of pediatric asthma exacerbations. osmond's work highlights the fact that awareness of the benefits of mdi/spacers is not enough to shape adoption process. the findings of our study provide rich detail about the complex array of individual, innovation and practice environment elements that influence this practice change. in addition to complementing osmond's work, our findings also build upon the work of hurley and colleagues [ ] . they explored canadian ed healthcare professionals' perceptions associated with the use or non-use of mdi/ spacers for the delivery of beta-agonist respiratory medications in two teaching hospitals. they discovered the main impediments to be increased workload, increased equipment costs, myths about the superiority of nebulisation, and interprofessional conflict. while there are parallels in terms of some of the identified barriers to mdi/spacer adoption, our findings revealed a more dynamic picture with early adopter and yet to adopt sites sharing some of the barriers, yet early adopter sites created/generated support and leadership to overcome the barriers. through a more robust research design and a larger sample size, we were able to build upon hurley's findings and learn that clarity about cost implications and the support from leadership and a mdi/spacer champion can be integral components to successful adoption. while this study sheds light on the factors that shaped the adoption process of the mdi/spacer, the results must be interpreted cautiously because individual responses were used to developed thick descriptions of a unit-level phenomenon (adoption of the mdi/spacer). although focus group interviews are an effective technique to acquire data on unit-level phenomenon, and much effort was exerted to ensure a broader representation in the composition of the focus group participants, readers must be mindful that individual perspectives were acquired. while we were able to identify the barriers and supports to implementation of the mdi/spacers, we collected data on several important contextual factors, such as ed census, and decision-making structure; however, in order to protect anonymity, specific data on these elements cannot be shared at the individual ed level. we did not interview parents to explore their perspective of mdi/spacer use. while they could have been considered adopters of the innovation because they must give consent for medical intervention, we limited our focus to how they influenced clinician behaviour. the final limitation of our study is a temporal one. it must be noted that we were not intervening and studying the effects of various strategies to facilitate mdi/ spacer adoption, rather, we were retrospectively exploring the adoption processes for eds that had either already adopted or were adopting the mdi/spacer, or exploring potential factors for sites that had not yet adopted the mdi/spacers. thus, limitations arise when asking participants to reflect upon events that happened, in some cases, several years ago. for example, participants in sites where adoption had taken place may have unintentionally justified their present practice by 'glossing over' the main barriers they had experienced in the past. having already adopted the practice, adopters could have been 'convincing themselves' that the path chosen was the correct one. however, through the use of focus groups, we believe that this limitation is minimized through the garnering of multiple perspectives on the process. the dissemination of research evidence rarely is sufficient by itself to improve healthcare and specific interventions are needed to address local barriers and facilitators [ ] . this is true with mdi/spacer adoption because, despite strong research evidence that mdi/spacer can produce results equivalent to nebulisers, adoption of mdi/spacers for the treatment of mild and moderate asthma has been slow. building upon the findings of key articles in the field [ , ] , we studied the natural evolution of diffusion patterns of mdi/spacers for the treatment of mild to moderate asthma in children in nine peds. through this research, we developed new knowledge about the social and political nature (i.e., in terms of resource allocation and presence of champions) of the adoption process as well as complexity in terms of the number of factors and levels that innovation adoption demands. clinical treatment and management of children's asthma exacerbations are engrained decisions and behaviours that are shaped by factors at the individual practitioner, department, and institutional levels. adding to this complexity, parental perceptions and expectations weighed heavily into healthcare professionals' decision making processes and the overall adoption process. generally speaking, the findings make important contributions to the complexity of innovation adoption processes on two fronts. first, awareness or knowledge of the innovation is not enough to change practitioner behaviour. rather, of note, most practitioners knew of the merit of mdi/spacer use, however, this knowledge was not enough to independently propel successful adoption because of the magnitude of other unit level barriers. second, there were no 'magic bullets' or patterns of barriers and/or facilitators that consistently led to adoption success at each of the adopter/ adopting sites. rather, the unit/departmental barriers and facilitators interacted and interplayed with historical, contextual, and cultural values at each site. our findings represent important knowledge for eds preparing to implement mdi/spacers for asthma treatment, and potentially other innovation adoption decisions that are based upon strong research evidence. this study demonstrates the value of assessing the unit-specific barriers and facilitators prior to planned implementation change initiatives in order to tailor implementation strategies. holding chambers versus nebulisers for beta-agonist treatment of acute asthma (cochrane review) metered-dose inhaler: the emergency department orphan evaluating the effectiveness of evidence-based guidelines for the use of spacer devices in children with acute asthma successful implementation of spacer treatment guideline for acute asthma child and patient satisfaction with the use of spacer devices in acute asthma perc spacer study group: barriers to metered-dose inhaler/spacer use in canadian pediatric emergency departments: a national survey perceptual reasons for resistance to change in the emergency department use of holding chambers for children with asthma case study research: design and methods diffusion of innovations qualitative research methods for health professionals thousand oaks focus groups: theory and practice court reporters: a viable solution for the challenges of focus group data collection? basics of qualitative research. grounded theory procedures and techniques the discovery of grounded theory: strategies for qualitative research toward a comprehensive interdisciplinary model of health care research use innovations in knowledge transfer and continuity of care competing paradigms in qualitative research alternate arguments for generalizing form data as applied to qualitative research explaining diffusion patterns for complex health care innovations. health care manage rev lost in knowledge translation: time for a map? the authors declare that they have no competing interests. sds provided leadership and coordination in the design and conduct of the study, participated in data analysis and interpretation, drafted and edited the final manuscript, and approved the final submitted manuscript. mho conceived the study, contributed to the overall study design, participated in data analysis, assisted in drafting and editing the manuscript. kaol carried out data collection, participated in analysis, and helped to draft and edit the manuscript. idg, jg, and tk contributed to study conception, and participated in critically appraising and revising the intellectual content of the manuscript. all authors read and approved the final manuscript. key: cord- -rw byz authors: stahl, guillaume; mccarty, gregory p; farabaugh, philip j title: ribosome structure: revisiting the connection between translational accuracy and unconventional decoding date: - - journal: trends biochem sci doi: . /s - ( ) - sha: doc_id: cord_uid: rw byz the ribosome is a molecular machine that converts genetic information in the form of rna, into protein. recent structural studies reveal a complex set of interactions between the ribosome and its ligands, mrna and trna, that indicate ways in which the ribosome could avoid costly translational errors. ribosomes must decode each successive codon accurately, and structural data provide a clear indication of how ribosomes limit recruitment of the wrong trna (sense errors). in a triplet-based genetic code there are three potential forward reading frames, only one of which encodes the correct protein. errors in which the ribosome reads a codon out of the normal reading frame (frameshift errors) occur less frequently than sense errors, although it is not clear from structural data how these errors are avoided. some mrna sequences, termed programmed-frameshift sites, cause the ribosome to change reading frame. based on recent work on these sites, this article proposes that the ribosome uses the structure of the codon–anticodon complex formed by the peptidyl-trna, especially its wobble interaction, to constrain the incoming aminoacyl-trna to the correct reading frame. arguably the most important feature of protein synthesis is the ability to maintain the correct reading frame. when reading an mrna, a ribosome must correctly interpret each successive tri-nucleotide codon as a particular amino acid. the ribosome must also decode only adjacent, nonoverlapping codonsthose lying in a single reading frame. however, mrna lacks punctuation, internal signals that identify which nucleotide triplets constitute codons (first noted by crick et al. [ ] ). therefore, when a ribosome loses track of the correct reading frame it has no way to re-establish this. although ribosomes do make frameshift errors, these occur at a very low rate, probably much less than × − per codon, or at least an order of magnitude less frequently than ribosomes incorporate an incorrect amino acid (termed sense errors) [ ] . although we lack an explicit, accepted model for frame maintenance, we do have a more complete understanding about the correction of sense errors. the error-correction machinery distinguishes between correct (cognate) and incorrect (noncognate) aminoacyl-trnas (aa-trnas) by the structures they form in the decoding sites. ribosomes increase the accuracy of trna recruitment and recognition by a process called kinetic proofreading [ ] [ ] [ ] . to amplify discrimination, the process of trna selection is divided into two steps, one before and one after gtp hydrolysis, by elongation factor tu (ef-tu), which deposits aa-trna onto the ribosome. during each step, noncognate trna is much more likely to dissociate from the ribosome than is cognate trna. moreover, recent observations show that when bound to the ribosome, cognate complexes formed between aa-trna and ef-tu manipulate the ribosome and improve discrimination [ ] [ ] [ ] . the past two years have witnessed an incredible burst of information about the structure of the ribosome and its interactions with ligands. for our purposes, the precise nature of the interaction between the mrna, trnas and the s ribosome are most exciting [ ] [ ] [ ] . the data give a glimpse of the workings of this amazing molecular machine; in particular, a clearer picture of the nature of the errorcorrection process. ribosomes have three trnabinding sites, termed aminoacyl (a), peptidyl (p) and exit (e) sites. during translation, aa-trnas enter the ribosome and bind to a codon in the a site. after accepting transfer of the growing peptide from the preceding trna, they translocate to the p site, donate the peptide to the succeeding trna and move to the e site before dissociating from the ribosome. the newly available structures confirm that trna basepairs with the mrna in the a-and p sites, and show that nucleotides and amino acids in the ribosome directly contact the codon-anticodon complex in each site ( fig. ) . a description of the interactions in the a site comes from ogle et al. [ ] , who solved a structure of the s subunit complexed with models of the mrna and a-site trna to < . Å. the structure reveals a complex set of direct and indirect interactions between each of the three pairs of bases in the a site and residues of both the s rrna and ribosomal protein s (rps ) (fig. ) . the contacts between the a site and the first and second base pairs ( - and - in fig. ) effectively measure the distance between the phosphoribose backbones by bridging the ′ oh groups of each ribose. this precludes formation of a non-watson-crick pair. the third base pair, or wobble position, does not form a bridging set of contacts. instead, the ribose ′ oh of the codon nucleotide makes two contacts, a hydrogen bond and a metal-mediated interaction. a packing interaction between c and the ribose of the anticodon further stabilizes the wobble pair without constraining its geometry as strongly as the geometries of other base pairs are constrained. as a result, a wider variety of interactions are possible, opinion consistent with the expanded pairing observed at the wobble position. the nature of the interactions at the p site can be deduced from structures, solved to either . Å or Å resolution, of the s ribosome complexed with a trna bound to the p site [ , ] , and from a structure, solved to . Å, of the s subunit containing a trna-mimic helix in the p site [ ] . these structures show evidence of a similar nexus of interactions. whereas the a-site complex is relatively accessible to solvent on the inner surface of the ribosome, the p-site complex is buried in a cleft created by rrna and ribosomal proteins. an important feature of the structure is the presence of g at a position that blocks further extension of the p-site codon helix and forces the mrna to adopt a kinked conformation. because of this kink, the two trna anticodon loops in the a-and p sites come no nearer than Å. therefore, g effectively defines the end of the codon in the p site and, by forcing the kink, the beginning of the codon in the a site (fig. ) . these results explain the role of some of the residues that have been shown genetically to be required to maintain accuracy [ , ] . some of the essential residues of the s subunit directly contact the codon-anticodon helices in the a-and p sites: c and a of s rrna in the a site [ ] , and c in the p site [ , ] . all other residues implicated in accuracy are located close to, but do not touch, the decoding sites. these include four residues in helix , which also contains residues c and g of the decoding site; seven residues in helix , which lies beneath the p site and undergoes a conformational change that regulates the accuracy of the a site [ ] ; five residues in helix , which also includes c ; six nucleotides in helix , a structure that includes elements of both the a-and p sites [ , ] ; and multiple residues of rps that are near amino acids pro and ser , both of which are elements of the a site. mutations in the s rrna of the s ribosome can also affect accuracy. some of the required residues lie in helix of the s rrna, an area that directly contacts the s subunit at helix , which is adjacent to the decoding sites. all these mutational changes might affect accuracy by either stabilizing or destabilizing contacts between the ribosome and the codon-anticodon complexes, either directly or indirectly. altering the stability of these contacts would affect the efficiency of discrimination during kinetic proofreading: increasing stability would tend to retain noncognate trnas in the a site, thus decreasing accuracy, whereas decreasing stability would increase accuracy. a complete list of rrna mutations, including those that affect accuracy, is available at the website of the ribosomal mutation database project (http://ribosome.fandm.edu). although interactions between the ribosome and codon-anticodon helix of the a site indicate a model for trna discrimination, they do not explain how the reading frame is maintained. one approach to explaining this is to study special circumstances in which the frequency of frameshift errors can be increased by up to %. frame-maintenance can be reduced either by mutations in rrna or by changes to various other components of the translational apparatus, such as ef-tu, trnas and mrna sequences. understanding how the incidence of frameshift can be increased might provide insights into how it is prevented normally. theoretically, there are four ways to disrupt reading-frame maintenance: • translation assumes that each trna recognizes exactly three nucleotides. occasional expansion or contraction of the codon size could shift the reading frame in the + (forward) or − (backward) direction, respectively. • although recognizing a three-nucleotide codon, the incoming aa-trna could bind to three nucleotides that are not in the normal frame. • after a trna pairs with the mrna, the ribosome must translocate exactly three nucleotides to display the succeeding codon in the a site; translocation of either four or two nucleotides would cause + or − frameshifting. •after translocation and before recognition of the next trna, only the peptidyl-trna (pep-trna) is base-paired to the mrna; at this stage, any slippage of the trna on the mrna could cause a frameshift in either the forward or backward direction. the extremely low frequency of frameshift errors indicates that errors in the size of the repeating threenucleotide step are rare. however, how this step size is opinion maintained remains controversial. in one model it is suggested that the size of trna anticodons defines the step size by defining both codon length and translocation distance. this view is derived from studies of mutant trnas in which expansion of the anticodon to four nucleotides causes a + frameshift. in the first characterized example of this, the suppressor sufd alters a trna gly by expanding its anticodon from ccc to cccc. this allows trna gly to suppress frameshift mutations in structural genes that result from the expansion of ggn gly codons to gggn (where n is any nucleotide) [ ] . the structures indicate a simple, elegant model, termed the quadruplet translocation model [ ] . this proposes that by expanding the anticodon to four nucleotides, the trna could base-pair to a four-nucleotide codon in the mrna and that, during translocation, the trna would move four nucleotides into the p site, thus displaying a + shifted codon in the a site [ ] . it is important to remember that this model is hypothetical and that, for example, there is no solved structure of a frameshift-suppressor trna paired with an mrna. the quadruplet translocation model assumes that the ribosome has no direct role in defining step size. however, the recent structure of an aa-trna-mrna- s ribosome complex seems to contradict this assumption [ ] . the decoding site explicitly contacts each of the three base-pairs that form in the a-and p sites [ , ] , which indicates that the structure of the ribosome implicitly defines the codon. in fact, the ribosome appears to constrain the mrna so that only three nucleotides of the trna can interact with the a-site codon (fig. ) . a ° kink in the mrna on the ′ side of the a-site codon defines the first nucleotide in the a site. at the other end of the interaction, the ribosome places nucleotide c immediately below the wobble nucleotide of the trna. the distance between the kink and c is sufficient to allow codon-anticodon interaction between three pairs of bases, but not four. the model was further called into question by results showing that the nucleotide at the ′ end of the putative four-nucleotide anticodon is modified to -methylguanosine in a major class of frameshift suppressor trnas, trna pro suppressors, in salmonella typhimurium [ ] . because methylation blocks base pairing to cytosine, these trnas are incapable of a four base-pair interaction with mrna. importantly, in the quadruplet translocation model, the modified nucleotide would recognize the first nucleotide in the expanded anticodon. by blocking base pairing, this modification should render the trna incapable of distinguishing suppressible sites (e.g. cccc) from nonsuppressible ones (accc, gccc or uccc). the fact that the trna can distinguish these sites demonstrates that the quadruplet translocation model is invalid for these suppressors and requires that they cause frameshifts by another mechanism, presumably involving triplet recognition. together with the recent structural data, these results greatly weaken, if not invalidate, the concept that translational step-size is defined explicitly by the trna anticodon. although the abnormal structure of an expanded anticodon loop appears to prevent the ribosome reading an abnormal four-nucleotide codon, it does cause the ribosome to make frameshift errors. an explanation of this comes from what was thought initially to be an unrelated phenomenon, programmed frameshifting. this is a ubiquitous, although rare, event in which ribosomes are forced to shift reading frame at special sites in mrnas. the process is stochastic; programmed-frameshift sites increase the probability of a frameshift occurring from the normally low level of random errors (< × per codon) to as much as %. according to the current view, programmed frameshifts occur at mrna sequences that pause the ribosome with the a-and p sites located over special frameshift-stimulating signals. the mechanism of frame disruption varies between sites. in the most common form, termed − simultaneousslippage frameshifting [ ] , the frameshift signal is a heptanucleotide of the form x-xxy-yyz, grouped in codons of the upstream normal frame, where xxx is a triplet of any repeating nucleotide, yyy is a triplet repeat of either a or u, and z varies between species. for example, the heptanucleotide in the coronavirus infectious bronchitis virus is u-uua-aac [ ] . in most cases, a downstream secondary structure (a pseudoknot and a hairpin loop in eukaryotes and prokaryotes, respectively) causes the pause over this sequence (reviewed in ref. [ ] ). despite the number and phylogenetic ubiquity of such sites, we do not yet understand how these signals stimulate frameshift errors. recent work on a far less common form of programmed frameshifting provides mechanistic explanations of this process. programmed + frameshifting in saccharomyces cerevisiae, first identified in the ty family of retrotransposons, also occurs at heptameric sequences, although of a different structure to those discussed above (reviewed in ref. [ ] ). the sites consist of two codons of the normal frame plus a seventh base at the ′ end. the second codon is recognized very slowly in the ribosomal a site because of the low availability of its cognate trna. this pauses the ribosome with a pep-trna bound to the first codon in the ribosomal p site. recent evidence indicates that 'unconventional decoding' by this pep-trna can directly stimulate frameshift errors by out-of-frame binding of an incoming aa-trna [ ] . because of the redundancy of the genetic code, during translation each amino acid can be decoded by one of several trnas, termed isoacceptors. conventional decoding uses an isoacceptor that can fully pair with the codon. for example, although the codons gcg and gcu both encode alanine; the former is read by a cognate trna with the anticodon cgc, whereas an isoacceptor with an igc anticodon decodes the latter (both codons and anticodons are represented in ′- ′ polarity). the term unconventional decoding refers to reading by trnas that have a less than optimal interaction with the mrna. reading of gcg by the igc-trna would be an unconventional decoding event because a g-i wobble interaction does not allow a normal watson-crick base-pair. in fact, the juxtaposition of the two purines at the wobble position would introduce a clash that destabilizes rna-rna helices. although conventional, cognate decoding should predominate, the lack of some cognate trnas in s. cerevisiae allows unconventional decoding by more abundant isoacceptors. because their structural genes have been deleted, some cognate trnas are either present in low concentration or are absent [ ] . this unconventional decoding can result in weak pyrimidine-pyrimidine pairs or purine-purine clashes in the wobble position. the presence of such unusual p-site wobble pairs causes frequent, erroneous out-of-frame decoding in the a site and results in frameshifts [ ] . because out-of-frame decoding competes with normal, in-frame decoding [ , ] , the occurrence of frameshifts is enhanced by the slow recognition of the next in-frame codon. how might an unconventional wobble pair stimulate frameshift errors? an interaction between the p-and a-site trnas had been postulated to stabilize reading of adjacent codons [ ] . however, this model is now thought to be unlikely because the kink between the a-and p sites allows the two trnas to come no closer than Å [ , ], which makes direct contact difficult and an indirect effect probable. although it is possible that the effect could propagate through ribosomal components, we propose a simple model in which it propagates through the mrna (fig. ) . the kink is the key to this model and it is compatible with simultaneously forming normal cognate pairs in the a-and p sites (fig. a) . in this model, out-of-frame recognition in the a site requires the + frame codon to move into the a site, thereby displacing the nucleotide to be skipped (fig. b) . this disruption might be incompatible with the normal structure of the wobble pair in the p site (shown in fig. as forcing breakage of the pair although, alternatively, this might disrupt ribosomal contacts with the codon-anticodon pair). the high-energy cost required to form this structure would reduce the efficiency of out-of-frame recognition. by the same argument, because of the strain of the kink between the two sites, a non-canonical wobble-pair interaction in the p site might disturb formation of a normal codon-anticodon interaction in the a site (fig. c) and reduce the efficiency of in-frame decoding. such an effect on nonsense suppression has been observed [ , ] . in this case, out-of-frame recognition still requires displacement of the skipped nucleotide (fig. d) . however, if the abnormal structure of the p-site wobble pair itself forces disruption of ribosomal interactions, it would eliminate the extra energy cost opinion relative to in-frame decoding because both in-frame and out-of-frame aa-trnas would be precluded from forming their maximum-energy structure. this would reduce the energetic advantage of in-frame decoding and so should indirectly increase the probability of out-of-frame binding. experiments using drugs that induce errors demonstrate that minor disruptions to the structure of the ribosome-mrna-trna complex can have profound effects on translational fidelity. nucleotides a and a of the s rrna undergo an energetically costly rearrangement when cognate trna enters the a site, which allows them to interact with the codon-anticodon complex. paromomycin stabilizes this conformation in the absence of trna and so reduces the energy difference between cognate and noncognate trna binding; by paying the cost of the rearrangement, paromomycin promotes both sense and frameshift errors [ ] . we argue for an opposite effect in which an aberrant wobble-pair interaction in the p site reduces the energetic advantage of in-frame decoding, which indirectly increases the probability of out-of-frame binding. in this model, formation of the correct wobble-pair interaction in the p site is crucial to continued in-frame decoding. this is consistent with the effect of unusual wobble interactions in programmed frameshifting [ ] , and with the fact that wobble in the p site is constrained by a pincer interaction by two rrna nucleotides: c and m g [ ] . it is probably significant that of the nucleotides that contact the codon-anticodon complex within the p site, only mutation of c increases the occurrence of frameshifts and other errors [ ] . similarity between the effects of a c mutation and an unusual wobble pair implies that they might increase the probability of out-of-frame recognition in the a site in similar ways. in-frame recognition in the a site could be ensured by holding the p-site wobble-pair tightly between c and m g . although we lack details of the way these two residues interact with the codon-anticodon complex in the p site, their inability to interact with abnormal, non-watson-crick base-pairs could explain why an unusual wobble pair disrupts normal, in-frame decoding. a possible explanation of the ability of trnas with expanded anticodon loops to stimulate frameshifting might be that they, too, interfere with the ability of the p site to restrict recognition of in-frame codons at the a site. recent data also show that frameshifts induced by + suppressor trnas occur in competition with normal in-frame decoding of the succeeding codon [ ] , as occurs in programmed + frameshifting. an additional nucleotide in the anticodon loop might disrupt the correct interaction of the p-site rrna with the codon-anticodon complex, which could increase the probability of out-of-frame recognition despite the fact that the suppressor continues to make three base-pairs with the mrna. this is plausible because we know that the additional nucleotide in the sufd suppressor form of trna pro slows recognition in the a site, and allows another isoacceptor to read its cognate codon [ ] . this misreading does not happen with wild-type trna. genetic tools for dissecting the translational accuracy system? the genetic dissection of programmed + frameshifting appears to have revealed an important feature of ribosome-mediated error correction. direct biochemical tests are now necessary to determine whether the inferences drawn from the genetics are correct; that is, that maintaining the reading frame depends crucially on the stability of the pep-trna wobble-pair binding in the p site. of the different forms of programmed frameshifting, it is easiest to hypothesize that programmed + frameshifting in yeast results from direct disruption of the translational-accuracy mechanism. for example, an mrna sequence found at one programmed + frameshift site in yeast appears to stimulate errors by base pairing with, and inactivating, helix of the small ribosomal subunit rrna, which is an essential element of the ribosomal-accuracy mechanism [ , ] . it remains to be seen whether other forms of programmed, unconventional decoding also disrupt error correction by the ribosome. given the diversity of these decoding events, the ways in which they could undermine accuracy could also be diverse. the diversity of these programmed events is consistent with the results of the structural studies, which show that the ribosome uses an elaborate system to maintain translational accuracy. therefore, by studying unconventional decoding systems we can dissect the mechanisms of translational accuracy, a process that is crucial to the cell. recoding systems should continue to provide tools to dissect this still incompletely understood system. general nature of the genetic code for proteins translational accuracy and the fitness of bacteria kinetic proofreading: a new mechanism for reducing errors in biosynthetic processes requiring high specificity kinetic amplification of enzyme discrimination eftu provides an internal kinetic standard for translational accuracy the loop of s rrna: a signal to ef-tu? induced fit in initial selection and proofreading of aminoacyl-trna on the ribosome conformational switch in the decoding region of s rrna during aminoacyl-trna selection on the ribosome recognition of cognate transfer rna by the s ribosomal subunit crystal structure of the ribosome at . Å resolution the path of messenger rna through the ribosome opinion functional insights from the structure of the s ribosomal subunit and its interactions with antibiotics genetic probes of ribosomal rna function limitations of translational accuracy a conformational switch in escherichia coli s ribosomal rna during decoding of messenger rna stucture of functionally activated small ribosomal subunit at . Å resolution structure of the s ribosomal subunit frameshift suppression: a nucleotide addition in the anticodon of a glycine trna frameshift suppression a new model for phenotypic suppression of frameshift mutations by mutant trnas signals for ribosomal frameshifting in the rous sarcoma virus gag-pol region characterization of an efficient coronavirus ribosomal frameshifting signal: requirement for an rna pseudoknot programmed translational frameshifting near-cognate peptidyl-trnas promote + programmed translational frameshifting in yeast transfer rna gene redundancy and translational selection in saccharomyces cerevisiae ribosomal frameshifting in the yeast retrotransposon ty: trnas induce slippage on a nucleotide minimal site a novel programmed frameshift expresses the pol gene of retrotransposon ty of yeast: frameshifting without trna slippage trna-trna interactions within cellular ribosomes codon-anticodon interaction at the ribosomal p site improves the accuracy of the decoding process decoding with the a:i wobble pair is inefficient decoding fidelity at the ribosomal a and p sites: influence of mutations in three different regions of the decoding domain in s rrna programmed + frameshifting stimulated by complementarity between a downstream mrna sequence and an errorcorrecting region of rrna rasmol: biomolecular graphics for all this work was supported by grants from the national institute of general medical sciences (gm ), the fogarty international center of the national institutes of health (tw ) and the swedish cancer foundation ( -b - vaa). the transfer of an enveloped virus genome into the interior of its target cell requires fusion between the viral and endosomal or plasma membranes. the energy barriers associated with this process are overcome with the aid of specific viral envelope proteins [ ] . most viral fusion proteins are integral membrane proteins that form higher-order oligomers [ ] . in paramyxoviridae, a family that includes major human pathogens [ ] , these proteins are trimers synthesized as inactive precursors that must be cleaved by host proteases for activity. the resulting activated fusion protein comprises an n-terminal and a c-terminal subunit. the latter is anchored to the viral membrane by a transmembrane segment proximal to its c terminus, whereas the n terminus of the c-terminal subunit hosts a region of ~ - , mainly apolar, residues, termed the fusion peptide. this peptide is thought to play a crucial role during the fusion process [ ] ( fig. a ). after protease activation, but before attachment of the virus to its target cell, fusion proteins are thought to be in a metastable state, also called a pre-fusion conformation. stimuli derived from the host cell then trigger a cascade of conformational changes that includes the insertion of the n-terminal fusion peptide into the target cell membrane, ultimately leading to the merging of the cell and viral bilayers. based on the x-ray-determined three-dimensional structure of fragments from several viral fusion proteins [ ] , it has been postulated that, at the end of the process, some regions of these proteins adopt a stable post-fusion conformation in which a heptad repeat segment, located downstream from the n-terminal fusion peptide, forms a trimeric central coiled coil (fig. a,b) . packed against the grooves of the coiled coil are three helices that are made up of heptad repeat regions located close to the transmembrane domain. the antiparallel nature of this trimeric helical hairpin suggests that, at the end of the fusion process, both the n-terminal fusion peptide and the transmembrane domain are embedded within the same membrane. despite the present structural knowledge, the actual mechanism of viral-induced infection by enveloped viruses requires fusion between the viral and cellular membranes, a process mediated by specific viral envelope glycoproteins. information from studies with whole viruses, as well as protein dissection, has suggested that the fusion glycoprotein (f) from paramyxoviridae, a family that includes major human pathogens, has two hydrophobic segments, termed fusion peptides. these peptides are directly responsible for the membrane fusion event. the recently determined three-dimensional structure of the pre-fusion conformation of the f protein supported these predictions and enabled the formulation of: ( ) a detailed model for the initial interaction between f and the target membrane, ( ) a new model for paramyxovirusinduced membrane fusion that can be extended to other viral families, and ( ) a novel strategy for developing better inhibitors of paramyxovirus infection. key: cord- -uc i h m authors: izaguirre, gonzalo title: the proteolytic regulation of virus cell entry by furin and other proprotein convertases date: - - journal: viruses doi: . /v sha: doc_id: cord_uid: uc i h m a wide variety of viruses exploit furin and other proprotein convertases (pcs) of the constitutive protein secretion pathway in order to regulate their cell entry mechanism and infectivity. surface proteins of enveloped, as well as non-enveloped, viruses become processed by these proteases intracellularly during morphogenesis or extracellularly after egress and during entry in order to produce mature virions activated for infection. although viruses also take advantage of other proteases, it is when some viruses become reactive with pcs that they may develop high pathogenicity. besides reacting with furin, some viruses may also react with the pcs of the other specificity group constituted by pc /pc /pace /pc . the targeting of pcs for inhibition may result in a useful strategy to treat infections with some highly pathogenic viruses. a wide variety of pc inhibitors have been developed and tested for their antiviral activity in cell-based assays. the regulation of viral cell entry by proteases is a control mechanism common among viruses ( table ). the proteolytic processing of viral proteins is often required for virus maturation and infectivity. a critical group of host-cell proteases exploited by a variety of viruses is the family of proprotein convertases (pcs), which includes furin, pc , pc , pace , and pc [ , ] . although other types of proteases besides pcs can also perform the proteolytic maturation of viruses, it has been observed that when pcs process viral proteins, some viruses become comparatively more infective and pathogenic. most of the research done on the maturation of viruses by pcs has focused on furin. however, there is evidence of the involvement of other pcs in the regulation of virus maturation [ ] . the scattered information about the role of pcs in the life cycle of a wide variety of viruses [ , ] , in addition to the new developments on pc activity regulation and reaction specificity [ ] [ ] [ ] [ ] , calls for an effort to integrate this knowledge, analyze the relevance of pcs in the pathogenicity of viruses, and evaluate the feasibility of inhibiting pcs as a sound strategy for antiviral therapy. this review will discuss the importance of differences of pc reactivity and selectivity, and the pc gene expression profile of infected cells, in determining virus infectivity and tropism. the proteolytic maturation of viruses by pcs generally involves the processing of proteins localized on the surface of viral particles, either of non-enveloped or enveloped viruses [ ] . the cleavage of the surface viral proteins mostly occurs inside the host cells during virus morphogenesis and before egress, although cleavage by the target-cell pcs can occur extracellularly or during cell entry with some viruses. the proteolytic processing by pcs promotes binding and fusion of viral particles to target cells. pcs are eukaryotic serine proteases classified in the merops peptidase database within the s b family. furin, pc , pc , pace , and pc are part of the kexin-like subfamily of pcs and localize to the organelles of the constitutive protein secretion pathway [ ] . these pcs perform the proteolytic post-translational modification of a large variety of peptides and proteins in the trans-golgi network, endosomes, and pericellular environment, and are critical regulators of central cellular processes, such as growth, proliferation, and differentiation [ ] [ ] [ ] . the gene expression profile of the kexin-like pcs is cell-type dependent, but most cells express some or all of them, except for pc whose expression is restricted to cell types in the testes, ovaries and the placenta. pcs are large-size multidomain proteins composed of conserved catalytic and regulatory p domains that share %- % amino acid sequence homology ( figure ). furin, pc , and pc are type i membrane-bound proteins, and furin pcs are eukaryotic serine proteases classified in the merops peptidase database within the s b family. furin, pc , pc , pace , and pc are part of the kexin-like subfamily of pcs and localize to the organelles of the constitutive protein secretion pathway [ ] . these pcs perform the proteolytic post-translational modification of a large variety of peptides and proteins in the trans-golgi network, endosomes, and pericellular environment, and are critical regulators of central cellular processes, such as growth, proliferation, and differentiation [ ] [ ] [ ] . the gene expression profile of the kexin-like pcs is cell-type dependent, but most cells express some or all of them, except for pc whose expression is restricted to cell types in the testes, ovaries and the placenta. pcs are large-size multidomain proteins composed of conserved catalytic and regulatory p domains that share - % amino acid sequence homology ( figure ). furin, pc , and pc are type i membrane-bound proteins, and furin and pc can be shed extracellularly; in contrast, pace is a secreted protein. kexin-like pcs cleave their substrates at sites specified by a motif composed of p arg-p x-p x-p arg-p x, where x is any amino acid residue, and cleavage occurs between the p arg and p x residues. this sequence motif is found in many viral surface proteins and determines cleavage by pcs [ ] . the kexin-like pcs are divided into two specificity groups, one represented by furin, and the other by pc /pc /pace /pc [ ] . furin is viruses , , of more reactive than the other pcs, and the differences in reaction specificity between the two groups are based on active-site and exosite determinants of reactivity. kexin-like pcs are considered potential pharmacological targets for the treatment of viral infections by blocking virus maturation and infectivity. other uses for the targeting of these pcs include the inhibition of the activation of bacterial toxins such as shiga, anthrax, clostridium, pseudomonas, and diphtheria; and also, for the treatment of degenerative diseases such as metastatic cancer, alzheimer's, and osteoarthritis [ , , ] . the only known natural inhibitors of pcs are serpins, which are slow-binding type inhibitors that form covalently-linked inhibitory complexes with their target proteases [ ] . serpin b , currently the only pc inhibitory serpin identified in vertebrates, has higher specificity for furin than for pc /pc /pace /pc [ , , ] . more pc inhibitory serpins have been characterized in other organisms as well [ ] [ ] [ ] . a variety of synthetic pc inhibitors have been developed based on small molecules, peptides and their mimetic derivatives, and larger proteins [ , ] . however, the main obstacle for their therapeutic use has been their toxicity, and their lack of pc selectivity. an important research tool is the pc inhibitor α pdx, which is a derivative of the serpin α -antitrypsin with an engineered pc cleavage site motif at its reactive loop [ ] . this engineered serpin inhibits all the kexin-like pcs with the same specificity. more recently, we developed two α pdx-serpin b chimeras that selectively target each of the two pc specificity groups. one is α ord that specifically inhibits furin, and the other is α mdw that specifically inhibits pc /pc /pace /pc [ , ] . the literature on the proteolytic processing of viral surface proteins by pcs and the role that pcs play on the maturation of viruses will be reviewed, and finally, the development of pc inhibitors and their antiviral properties will be discussed. the human papillomavirus (hpv) infects the basal cells of stratified epithelium, and virion replication depends on the infected basal cells progressing into differentiated squamous cells. hpv infects by reaching the lower layers of the stratified epithelium through micro-wounds in the tissue. there, the viral particles bind to heparin sulfate proteoglycan receptors localized either on the extracellular matrix of the basement membrane or the cell surface [ , ] . hpv particles are constituted by a naked nucleocapsid, and work done with pseudovirion particles has suggested that conformational changes in the nucleocapsid proteins l and l , that are induced upon binding of the virus to cell-surface proteoglycans, prime l for cleavage by extracellular or pericellular pcs at the arg residue [ ] . the cleavage of l modifies the conformation of the coat proteins and allows the virion to engage another receptor, and that leads to cell internalization and infection [ , ] . the inhibition of the target-cell pcs blocks hpv infection, but the treatment of the pseudovirion particles with furin beforehand bypasses the inhibition. in contrast, the cleavage of live native hpv virions by pcs occurs during virion morphogenesis, so infectivity becomes independent from the target-cell pcs [ ] [ ] [ ] . also, the proteolytic processing seems to be hpv-type dependent, as evidenced by the native hpv virions being poorly processed during morphogenesis, and their infectivity being mostly dependent on the pcs of the target cells [ ] . l and l in hpv and hpv contain more pc cleavage site motifs besides the commonly studied l -arg . two pc cleavage site motifs, one at l -arg and the other at l -arg , are conserved in many hpv types ( table ). mutagenesis of the l -arg site has been reported to affect pseudovirion morphogenesis [ ] . the l -arg site is located in a region of l involved in the regulation of retrograde trafficking of the l -viral genome complex from the trans-golgi network and into the nucleus [ , ] . surprisingly, low-risk hpv types have lys at the l - position instead of the arg of high-risk types ( table ). if cleavage at the l - site is required for virus morphogenesis, lowand high-risk types may use different proteases, unless their morphogenesis is different. if the cleavage at the l -arg and l -arg sites takes place during cell entry, the cleavage sites may be hidden inside the folded protein and protected from being accessed by the pcs in the intact virion. however, the virions undergo controlled unfolding during entry and trafficking, so that the pc cleavage sites may become exposed and cleaved along their cell internalization route. the potential diverse expression of the pc genes in keratinocytes at different anatomical sites of hpv infection may contribute to the restricted cell tropism by hpv types, which is especially different between the skin and mucosal types [ ] . hpv is commonly found in the stratified epithelium of the ectocervix and tonsilar crypts, and hpv is mainly found in the glandular epithelium of the endocervix. their differences in pc reactivity may play a role in determining their particular tropism. table . pc cleavage site motifs in the coat proteins l and l of hpv types. the cleavage site numbers correspond to those in the hpv sequences. the p arg and p arg residues are denoted in red. the envelope glycoprotein b (gb) is the most conserved protein among all herpes viruses, and its function is to regulate virus to cell membrane fusion. gb is synthesized as a precursor protein, and pcs cleave it at a loop, which is located in domain ii of the ectodomain and at a distance from the fusion loop at domain i. the cleavage site loop is highly variable in length and amino acid sequence among herpes viruses ( table ). the cleavage of gb by pcs has been demonstrated [ ] . most herpes viruses have at least one pc cleavage site motif in the cleavage loop, although, hhsv is an exception by having no pc motifs at all. in contrast, other viruses have more than one pc cleavage site, which may be cleaved sequentially [ ] . the experimental inactivation of the pc cleavage site of several herpes viruses did not severely affect viral cell entry into cells growing in vitro; however, the lack of pc cleavage reduced virus spread and replication in vivo [ , ] . the cleavage of gb promotes virus-to-cell and cell-to-cell fusion [ , ] . although much is still needed to consolidate our knowledge of the cleavage of gb by pcs, there is no doubt that the presence of pc cleavage site motifs in gb is the result of selective evolutionary pressure [ ] . two proteins predominate in the envelope of flaviviruses, prm, and glycoprotein e [ ] . the association between the two proteins (prm-e) in the immature virus changes upon cleavage of prm by pcs during egress. the pr segment is removed to render the mature virions (m-e) (m-e) [ ] . all flaviviruses contain a pc cleavage site motif at the pr-m junction (table ) . a peculiar case is the maturation of the dengue virus (denv). its proteolytic processing is known to be very inefficient, and virions are produced in the prm-e form in high proportion. it was initially suspected that maturation might not be necessary for infectivity but later demonstrated that it is indeed needed [ ] . the inefficient maturation of the denv agrees with studies that show that anti-prm antibodies represent a significant proportion of the immune response to denv and that these antibodies are responsible for the development of antibody-dependent enhancement (ade) of infection in individuals suffering from recurrent denv infections [ ] . these observations suggest that the denv pc reactivity is weaker compared to that of other flaviviruses, which seem to mature more efficiently. the pc site sequence alignment presented in table shows that the four denv types have asp or glu residues at the p position of the cleavage site, compared to ser or thr in most other flavivirus sequences, including that of the zika virus (zikv). acidic residues at this position in the substrate sequence are detrimental to reactivity with pcs [ ] . based on these differences, it is expected that denv reacts with a dramatically lower reactivity toward the pcs compared to other flaviviruses and that higher rates of pc reactivity align with the strong virulence and broad cell tropism observed with other flaviviruses such as [ ] . therefore, it is not surprising that zikv can even reach the fetus and remain in bodily fluids of asymptomatic patients for more extended periods when compared to denv [ ] . variations of the pc gene expression profile may be a key factor determining the difference of tropism to testes between zikv and denv. pc is the primary pc expressed in testes [ ] . a vigorous reactivity of zikv with pc would explain why the testes suffer the highest loads of zikv compared to other organs and the sexual transmission of the virus. the viral pc reactivity and the cell pc gene expression profile both probably play a role in determining the cell tropism differences observed with flaviviruses [ ] [ ] [ ] . viruses of the genus alphavirus like chikungunya (chikv), semliki forest (sfv), sindbis, and ross river, all are arboviruses structurally related to flaviviruses [ ] . their glycoprotein precursor e e is cleaved by pcs in order to regulate its interaction with the glycoprotein e , which promotes virus to cell fusion and infection [ , ] . the information available about the processing of togavirus proteins by pcs is scant, but it reveals the existence of amino acid sequence variability in the pc cleavage sites between the chikv asian and african strains, and that this variability probably determines the observed differences of pc selectivity [ ] . the family of coronaviruses includes viruses of relevance to human and veterinary health. like other enveloped viruses that rely on surface glycoproteins for binding and fusion, coronaviruses have the spike (s) protein, which is cleaved by proteases during virion biosynthesis, as well as during entry into target cells [ ] . the proteolytic regulation of coronaviruses is probably one of the best-studied systems, and a complete picture of the regulatory system mechanism has been developed compared to other families of viruses that are less well-studied. the general principles of the proteolytic regulatory mechanism of coronaviruses based on the accumulated evidence include: ( ) these viruses are regulated by a variety of proteases, ( ) the protein s is cleaved sequentially at two cleavage sites, ( ) viruses can quickly adapt to the proteolytic environment of the infected cells, and ( ) the compatibility between the cleavage site-specificity and cell protease expression profile determines the cell and tissue tropism and pathogenicity of the virus. furin is not the only protease that regulates the function of the coronavirus fusion protein. other proteases, such as the membrane-bound tmprss, the lysosomal cathepsins, elastase, and coagulation factor xa have also been implicated [ , ] . protein s is cleaved at the s -s junction during biosynthesis to separate the two major domains of the protein. the s domain is involved in receptor binding, and the s domain mediates the fusion step of the cell entry mechanism. during cell entry, the cleavage at s -s primes s for the second cleavage at the s site [ ] [ ] [ ] [ ] . in many coronaviruses, the s -s cleavage seems to be dispensable; however, the cleavage at s is not. the cleavage at s has been suggested to serve as a virulence marker [ ] . predictions of the furin/pc reactivity, based on the amino acid sequence surrounding the cleavage site, have been made based on computer algorithms [ ] ; however, the dependency of furin/pc reactivity on the conformation of the substrate and exosites lends uncertainty to those predictions. the highly virulent mers-cov (middle east respiratory syndrome coronavirus) is the only natural virus known to have pc cleavage site motifs at both the s -s and s sites. other viruses with two pc sites are the result of laboratory selection by their serial passage in cell lines in vitro, one such virus being the infectious bronchitis virus ibv-beaudette strain [ ] . mers-cov has an expanded tropism compared to other coronaviruses, so it is considered polytropic [ ] . only the s site in sars-cov (severe acute respiratory syndrome coronavirus) has a pc cleavage site motif [ , ] . the fact that mers-cov and sars-cov are highly pathogenic, and that ibv-beaudette is apathogenic is in line with these viruses reacting with proteases other than the pcs [ , ] . tmprss promotes sars-cov and mers-cov infection in vivo [ ] . the engineering of pc specificity at the cleavage sites of coronavirus s proteins can modify the virus tropism and virulence [ , ] . the conversion of a monobasic cleavage site into a polybasic site not only makes the virus susceptible to pc cleavage but also increases the chance of cleavage by other proteases that target single arginine residues, so it is not surprising that mers-cov is so pathogenic. because coronaviruses are adapted to the different proteolytic environments of the many cell types they infect, each virus may be activated by a specific set of proteases. it is crucial to define the protease cleavage specificity of viruses that impact human or animal health. the use of the pc inhibitor, dec-rvkr-cmk, has created some controversy as sometimes the inhibitor is used in excessive concentrations. the inhibitor binds pcs with a very high affinity, at low nm concentrations; it slowly forms covalent complexes with the enzymes, so it inhibits pcs in a stoichiometric manner. in our hands, treating cells with this inhibitor at a concentration of µm is enough to block hpv cell entry completely. concentrations up to µm reported in some studies should not be considered pc-specific; such high inhibitor concentrations most probably inhibit other proteases besides pcs [ , ] . medically relevant retroviruses of the retroviridae family have also been studied concerning their proteolytic regulation. the most studied viruses are the bovine and murine leukemia viruses, which are related to the human t-lymphotropic viruses. their envelope glycoproteins are cleaved by furin and other pcs [ ] . like the coronaviruses, leukemia virus glycoproteins are cleaved twice. pcs perform the first cleavage, which induces conformational changes and disulfide isomerizations that prime the protein for further proteolysis [ , ] . the second proteolytic event is performed by a viral protease that fully activates the glycoprotein [ , ] . the human immunodeficiency virus (hiv) env glycoprotein gp precursor is cleaved by furin during biosynthesis into gp and gp in the trans-golgi network (figure ). gp is further processed by furin into gp and gp after leaving the tgn [ , ] . the env glycoprotein is the only antigenic hiv protein, and furin cleavage-independent forms stabilized in the native form have been produced for vaccine development purposes [ ] . interestingly, a polybasic region located upstream from the pc cleavage site at the gp /gp junction was shown to bind heparin and promote cleavage [ ] . medically relevant retroviruses of the retroviridae family have also been studied concerning their proteolytic regulation. the most studied viruses are the bovine and murine leukemia viruses, which are related to the human t-lymphotropic viruses. their envelope glycoproteins are cleaved by furin and other pcs [ ] . like the coronaviruses, leukemia virus glycoproteins are cleaved twice. pcs perform the first cleavage, which induces conformational changes and disulfide isomerizations that prime the protein for further proteolysis [ , ] . the second proteolytic event is performed by a viral protease that fully activates the glycoprotein [ , ] . the human immunodeficiency virus (hiv) env glycoprotein gp precursor is cleaved by furin during biosynthesis into gp and gp in the trans-golgi network (figure ). gp is further processed by furin into gp and gp after leaving the tgn [ , ] . the env glycoprotein is the only antigenic hiv protein, and furin cleavage-independent forms stabilized in the native form have been produced for vaccine development purposes [ ] . interestingly, a polybasic region located upstream from the pc cleavage site at the gp /gp junction was shown to bind heparin and promote cleavage [ ] . figure . x-ray crystal structure of the hiv- envelope pg glycoprotein monomer. the fusion machine is composed of three gp monomers which are divided into n-terminal gp (green) and c-terminal pg (gray). residue in red denotes the end of gp and residue in blue the beginning of gp after pc cleavage. pdb id code mtj. the duck hepatitis b virus (dhbv) has been used as a model to study the hepatitis b virus (hbv). the proteolytic events that regulate the cell entry mechanism of this hepadnavirus have not attracted much attention, but there is evidence of the cleavage of the envelope proteins by pcs [ ] . single-stranded negative-sense rna viruses of the filoviridae and arenaviridae families and the new-order bunyavirales are the causative agents of lethal hemorrhagic fever diseases. despite the seriousness of the health threat these viruses represent, the information about the proteolytic regulation of their entry mechanism is scarce. the envelope glycoproteins of the ebola (ebov) and marburg (mbgv) viruses are processed by furin into two disulfide-linked subunits [ ] [ ] [ ] . except for the reston strain that has no pc cleavage site motifs, all other ebov strains have one; the reston strain is less pathogenic than the other ebov strains [ ] . the figure . x-ray crystal structure of the hiv- envelope pg glycoprotein monomer. the fusion machine is composed of three gp monomers which are divided into n-terminal gp (green) and c-terminal pg (gray). residue in red denotes the end of gp and residue in blue the beginning of gp after pc cleavage. pdb id code mtj. the duck hepatitis b virus (dhbv) has been used as a model to study the hepatitis b virus (hbv). the proteolytic events that regulate the cell entry mechanism of this hepadnavirus have not attracted much attention, but there is evidence of the cleavage of the envelope proteins by pcs [ ] . single-stranded negative-sense rna viruses of the filoviridae and arenaviridae families and the new-order bunyavirales are the causative agents of lethal hemorrhagic fever diseases. despite the seriousness of the health threat these viruses represent, the information about the proteolytic regulation of their entry mechanism is scarce. the envelope glycoproteins of the ebola (ebov) and marburg (mbgv) viruses are processed by furin into two disulfide-linked subunits [ ] [ ] [ ] . except for the reston strain that has no pc cleavage site motifs, all other ebov strains have one; the reston strain is less pathogenic than the other ebov strains [ ] . the glycoprotein of mbgv has two pc cleavage site motifs that do not agree in their amino acid sequence and position compared to the single pc site in the ebov protein [ ] . the cleavage by furin seems to be dispensable because the elimination of the pc site in the ebov protein does not affect the virus replication in cultured cells or the disease progression in experimental animals [ ] [ ] [ ] . ebov requires further proteolytic processing of the glycoprotein binding domain by endosomal cathepsins in order to gain binding activity [ ] [ ] [ ] . filoviruses are different from other viruses in that they require additional factors or modifications of the glycoprotein in order to gain infectivity [ , ] . the crimean-congo hemorrhagic fever bunyavirus (cchfv) glycoprotein is processed by furin and the proprotein convertase ski- , a pc of the pyrolysin-like type and also known as s p, which has a cleavage site specificity different from the polybasic specificity of the kexin-like pcs [ , ] . the cleavage by furin is not essential, but inactivating the cleavage site slows down virus replication [ , ] . the lymphocytic choriomeningitis (lcmv) and the lassa (lasv) arenaviruses are known to also require ski- activity for the cleavage of their envelope glycoproteins [ , ] . the paramyxoviridae is a diverse family of viruses, and a variety of proteases activate their fusion proteins. some paramyxoviruses are highly pathogenic. single proteolytic processing of the fusion protein occurs for most of these viruses. pcs perform the cleavage in the parainfluenza and the measles (mv) viruses [ ] [ ] [ ] . there are several serotypes of the avian paramyxoviruses (apmv). the glycoprotein of the highly pathogenic apmv- , or newcastle disease virus (ndv), is cleaved by furin, and the proteins of other serotypes are cleaved by undetermined trypsin-like proteases [ , ] . the mutation of the trypsin-like sites into pc site motifs made the viruses replicate faster in cell culture, but they remained non-virulent in vivo [ ] [ ] [ ] . conversely, the transformation of the pc cleavage site of the virulent apmv- strain into a trypsin-like site induced the virus to become highly attenuated [ ] . the pathogenic respiratory syncytial virus (rsv) is unique among paramyxoviruses in that its glycoprotein is cleaved at two sites by pcs [ ] . the first cleavage takes place before the virus enters the target cells, and the second occurs after entry into the endosomes [ ] . furin does not activate the lethal nipah (niv) and hendra (hev) viruses for entry; instead, the viruses depend on endosomal cathepsins [ , ] . these viruses produce systemic infections in several different hosts. the glycoprotein of the sendai virus (sev) requires the participation of the homologous attachment protein hemagglutinin-neuraminidase (hn), which binds the cell surface sialic acid receptors. sev glycoprotein has only one trypsin-like site, but by replacing it with the two rsv pc sites, the dependency on hn for infection is reduced [ ] . the influenza viruses cause respiratory disease and occasional pandemics. the virus envelope contains two glycoproteins, hemagglutinin (ha) and neuraminidase (na). both proteins contribute to the virus pathogenicity and the cleavage of ha precursor into ha and ha by the host cell pcs is a significant contributor of virulence for avian influenza (figure ). the extent and diversity of the cellular proteolytic activity is also an essential factor determining pathogenicity, spread, and tropism of the influenza virus [ , ] . there are ha types, but h and h are the most commonly present types in seasonal human infections, other types are found in birds. the pandemics of and were caused by the h n and h n strains, respectively. the proteolytic cleavage of ha occurs in a loop that varies in length and amino acid sequence depending on the strain ( table ). the loop usually contains one arg residue that determines cleavage by trypsin-like proteases. the cleavage can occur during synthesis, after release or before entry, and may depend on different proteases. the highly pathogenic virus strain responsible for the spanish influenza pandemic was of the h n type with only one arg residue in its cleavage loop. two proteases highly expressed in the respiratory tract, especially in the lungs, tmprss and tmprss , were shown to cleave the influenza ha [ ] . hat is a protease expressed in the airways, mostly in the larynx but not in lungs. it is also capable of activating influenza viruses [ ] . multibasic cleavage sites in ha arise by single substitution mutations like in the case of some h n types, or by insertions that result in longer loops, as observed with the highly pathogenic h and h types. viruses that acquire multibasic cleavage sites become independent of trypsin-like proteases. in low-pathogenic h n strains carrying the cleavage site motif, r-s-k-r, cleavage is not performed by pcs but by matriptase, which recognizes the same cleavage site motif of pcs and determines the nephrotropism of the virus [ ] . however, h n can become reactive with pcs by the removal of a glycosylation site near the cleavage site [ ] . the long and multibasic loops in some h and h strains are highly reactive with furin [ ] [ ] [ ] [ ] ; this reactivity leads to high pathogenicity that causes systemic infections [ , ] . an outbreak of a highly pathogenic avian h n strain that infected humans occurred in in hong kong. in some highly pathogenic h and h types that have the k-k-k-r motif, cleavage is carried out by the ubiquitous protease mspl and its splice variant mtprss , which are also capable of cleaving at the pc cleavage site motif [ ] . table . cleavage loop in the ha protein of influenza viruses. the variable region is blue with the arrow denoting the cleavage site. the p arg and p arg residues are colored red. figure . x-ray crystal structure of the ha trimer from the influenza virus a type h . each ha monomer is divided into ha (red, blue, and gray) and ha (oarange, cyan, and white, respectively) subunits after the pc cleavage denoted by residues at the end ha (yellow) and the beginning of ha (purple). pdb id code ibx. multibasic cleavage sites in ha arise by single substitution mutations like in the case of some h n types, or by insertions that result in longer loops, as observed with the highly pathogenic h and h types. viruses that acquire multibasic cleavage sites become independent of trypsin-like proteases. in low-pathogenic h n strains carrying the cleavage site motif, r-s-k-r, cleavage is not performed by pcs but by matriptase, which figure . x-ray crystal structure of the ha trimer from the influenza virus a type h . each ha monomer is divided into ha (red, blue, and gray) and ha (oarange, cyan, and white, respectively) subunits after the pc cleavage denoted by residues at the end ha (yellow) and the beginning of ha (purple). pdb id code ibx. the search for effective pc inhibitors centers into finding the inhibitor with the best characteristics of specificity, stability, and bioavailability [ ] . most pc inhibitors reported have been developed against furin. although these inhibitors are of high pc specificity, many of them still lack proper characterization of their pc selectivity. knowing the pc selectivity of an inhibitor is a critical issue as pcs differ in substrate specificity, and viruses can be pc-selective. synthetic pc inhibitors come in several forms, from small molecules identified by high-throughput screening [ ] [ ] [ ] [ ] ; to peptide substrates [ , ] , or viral cleavage sites [ ] ; peptide mimetic derivatives that add unnatural amino acids [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] ; cyclic peptides [ ] ; polyarginine [ ] [ ] [ ] ; and larger engineered proteins like the leech eglin c [ ] , turkey ovomucoid [ ] , α -macroglobulin [ ] ; and the engineered serpin α -pdx [ ] . peptide derivatives seem more efficient at producing high-affinity pc inhibitors compared to small molecules [ ] . due to the high density of negative charges at the pc active site, highly basic peptides show strong specificity and bioavailability, but may also be highly toxic [ ] . larger proteins are poised to become the most effective pc inhibitors as they offer better opportunities to build specificity and selectivity compared to small molecules. they can also be made bioavailable through a variety of routes. several of these pc inhibitors have been confirmed to have antiviral activity in cell-based assays of viral propagation (table ). table . inhibitors of pcs tested for their antiviral activity in cell-based assays of viral propagation. ski- inhibitors lcmv, lasv [ , ] peptides and peptidomimetics chikv, sfv wnv, denv h and h influenza [ , ] the ubiquitous presence of furin and related pcs throughout the cells of the body makes these proteases vulnerable to being exploited by viruses. the location of furin and related pcs in the vesicles of the constitutive protein secretion pathway, where viruses are assembled during morphogenesis or disassembled during cell entry, explains why a diversity of virus types have evolutionarily converged to depend on pcs. viruses also use other types of proteases for the proteolytic regulation of the binding and fusion functions; however, proteases are restricted to specific cell types, which limits the range of the viral infection, so when some viruses mutate and acquire pc reactivity, they may expand their cell tropism and become more pathogenic. the targeting of pcs for inhibition as an antiviral strategy is a sound possibility. probably the major advantage of this approach is that by not targeting a viral component or function, it reduces the chance of producing resistance. the main drawback is the ubiquitous distribution of pcs and the potential toxicity and secondary effects that their inhibition may cause. in consequence, it is essential to know the virus pc selectivity and to have pc inhibitors that are selective for one of the two pc specificity groups, furin or pc /pc /pace /pc [ ] . funding: this research received no external funding. the author declares no conflict of interest. the activation and physiological functions of the proprotein convertases the biology and therapeutic targeting of the proprotein convertases implication of the proprotein convertases furin, pc and pc in the cleavage of surface glycoproteins of hong kong, ebola and respiratory syncytial viruses: a comparative analysis with fluorogenic peptides proprotein convertases in health and disease on the cutting edge of proprotein convertase pharmacology: from molecular concepts to clinical applications ebola virus envelope grycoprotein derived peptide in human furin-bound state: computational studies structure of the unliganded form of the proprotein convertase furin suggests activation by a substrate-induced mechanism identification of serpin determinants of specificity and selectivity for furin inhibition through studies of α -pdx (α -proteinase inhibitor portland)-serpin b and furin active-site loop chimeras specific and selective inhibitors of proprotein convertases engineered by transferring serpin b reactive-site and exosite determinants of reactivity to the serpin α pdx processing of viral glycoproteins by the subtilisin-like endoprotease furin and its inhibition by specific peptidylchloroalkylketones substrate cleavage analysis of furin and related proprotein convertases. a comparative study high-resolution analysis and functional mapping of cleavage sites and substrate proteins of furin in the human proteome a residues motif delineates the furin cleavage site and its physical properties may influence viral fusion the serpins are an expanding superfamily of structurally similar but functionally diverse proteins. evolution, mechanism of inhibition, novel functions, and a revised nomenclature inhibition of soluble recombinant furin by human proteinase inhibitor the serpin proteinase inhibitor . an endogenous furin inhibitor released from human platelets drosophila serpin functions as a neuroserpin-like inhibitor of subtilisin-like proprotein convertases the spn gene of drosophila encodes a potent furin-directed secretory pathway serpin a proprotein convertase-inhibiting serpin with an endoplasmic reticulum targeting signal from branchiostoma lanceolatum, a close relative of vertebrates cutting back on pro-protein convertases: the latest approaches to pharmacological inhibition inhibition of hiv- gp -dependent membrane fusion by a furin-directed a -antitrypsin variant current understanding of the mechanism of hpv infection the role of furin in papillomavirus infection the initial steps leading to papillomavirus infection occur on the basement membrane prior to cell surface binding cleavage of the papillomavirus minor capsid protein, l , at a furin consensus site is necessary for infection furin cleavage of l during papillomavirus infection: minimal dependence on cyclophilins cleavage of the hpv minor capsid protein l during virion morphogenesis ablates the requirement for cellular furin during de novo infection efficient intracellular assembly of papillomaviral vectors maturation of papillomavirus capsids kallikrein- proteolytically processes human papillomaviruses in the extracellular space to facilitate entry into host cells the nuclear retention signal of hpv l protein is essential for incoming viral genome to transverse the trans-golgi network a central region in the minor capsid protein of papillomaviruses facilitates viral genome tethering and membrane penetration for mitotic nuclear entry microarray analysis of human keratinocytes from different anatomic sites reveals site-specific immune signaling and responses to human papillomavirus type transfection proteolytic processing of human cytomegalovirus glycoprotein b (gpul ) is mediated by the human endoprotease furin glycoprotein b of equine herpesvirus type has two recognition sites for subtilisin-like proteases that are cleaved by furin glycoprotein b cleavage is important for murid herpesvirus to infect myeloid cells mutagenesis of varicella-zoster virus glycoprotein b: putative fusion loop residues are essential for viral replication, and the furin cleavage motif contributes to pathogenesis in skin tissue in vivo cleavage of epstein-barr virus glycoprotein b is required for full function in cell-cell fusion with both epithelial and b cells positively selected sites at hcmv gb furin processing region and their effects in cleavage efficiency the . Å resolution cryo-em structure of zika virus a structural perspective of the flavivirus life cycle functional importance of dengue virus maturation: infectious properties of immature virions antibodies against the envelope glycoprotein promote infectivity of immature dengue virus serotype zika virus pathogenesis and tissue tropism persistence of zika virus in body fluids-preliminary report testicular expression of pc in the rat: molecular diversity of a novel germ cell-specific kex /subtilisin-like proprotein convertase changing the protease specificity for activation of a flavivirus, tick-borne encephalitis virus comparative analysis between flaviviruses reveals specific neural stem cell tropism for zika virus in the mouse developing neocortex zika virus impairs growth in human neurospheres and brain organoids glycoprotein organization of chikungunya virus particles revealed by x-ray crystallography furin processing and proteolytic activation of semliki forest virus inhibition of chikungunya virus infection in cultured human muscle cells by furin inhibitors: impairment of the maturation of the e surface glycoprotein host cell proteases: critical determinants of coronavirus tropism and pathogenesis cathepsin l functionally cleaves the severe acute respiratory syndrome coronavirus class i fusion protein upstream of rather than adjacent to the fusion peptide different host cell proteases activate the sars-coronavirus spike-protein for cell-cell and virus-cell fusion proteolytic activation of the spike protein at a novel rrrr/s motif is implicated in furin-dependent entry, syncytium formation, and infectivity of coronavirus infectious bronchitis virus in cultured cells activation of the sars coronavirus spike protein via sequential proteolytic cleavage at two distinct sites host cell entry of middle east respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein proteolytic processing of middle east respiratory syndrome coronavirus spikes expands virus tropism characterization of cellular furin content as a potential factor determining the susceptibility of cultured human and animal cells to coronavirus infectious bronchitis virus infection multiple organ infection and the pathogenesis of sars middle east respiratory syndrome coronavirus spike protein is not activated directly by cellular furin during viral entry into target cells tmprss contributes to virus spread and immunopathology in the airways of murine models after coronavirus infection cleavage of a neuroinvasive human respiratory virus spike glycoprotein by proprotein convertases modulates neurovirulence and virus spread within the central nervous system a single point mutation creating a furin cleavage site in the spike protein renders porcine epidemic diarrhea coronavirus trypsin independent for cell entry and fusion comparative processing of bovine leukemia virus envelope glycoprotein gp by subtilisin/kexin-like mammalian convertases furin cleavage potentiates the membrane fusion-controlling intersubunit disulfide bond isomerization activity of leukemia virus env furin cleavage of the moloney murine leukemia virus env precursor reorganizes the spike structure cooperative cleavage of the r peptide in the env trimer of moloney murine leukemia virus facilitates its maturation for fusion competence maturation cleavage of the murine leukemia virus env precursor separates the transmembrane subunits to prime it for receptor triggering comparative cellular processing of the human immunodeficiency virus (hiv- ) envelope glycoprotein gp by the mammalian subtilisin/kexin-like convertases the role of eukaryotic subtilisin-like endoproteases for the activation of human immunodeficiency virus glycoproteins in natural host cells structure of a cleavage-independent hiv env recapitulates the glycoprotein architecture of the native cleaved trimer heparin enhances the furin cleavage of hiv- gp peptides initiation of duck hepatitis b virus infection requires cleavage by a furin-like protease processing of the ebola virus glycoprotein by the proprotein convertase furin biochemical analysis of the secreted and virion glycoproteins of ebola virus proteolytic processing of marburg virus glycoprotein endoproteolytic processing of the ebola virus envelope glycoprotein: cleavage is not required for function reverse genetics demonstrates that proteolytic processing of the ebola virus glycoprotein is not essential for replication in cell culture proteolytic processing of the ebola virus glycoprotein is not critical for ebola virus replication in nonhuman primates endosomal proteolysis of the ebola virus glycoprotein is necessary for infection role of endosomal cathepsins in entry mediated by the ebola virus glycoprotein filoviruses require endosomal cysteine proteases for entry but exhibit distinct protease preferences proteolysis of the ebola virus glycoproteins enhances virus binding and infectivity ebola virus glycoprotein needs an additional trigger, beyond proteolytic priming for membrane fusion crimean-congo hemorrhagic fever virus glycoprotein precursor is cleaved by furin-like and ski- proteases to generate a novel -kilodalton glycoprotein recovery of recombinant crimean-congo hemorrhagic fever virus reveals a function for non-structural glycoproteins cleavage by furin antiviral activity of a small-molecule inhibitor of arenavirus glycoprotein processing by the cellular site protease targeting the proteolytic processing of the viral glycoprotein precursor is a promising novel antiviral strategy against arenaviruses proteolytic cleavage of wild type and mutants of the f protein of human parainfluenza virus type by two subtilisin-like endoproteases, furin and kex engineered serine protease inhibitor prevents furin-catalyzed activation of the fusion glycoprotein and production of infectious measles virus the role of subtilisin-like proprotein convertases for cleavage of the measles virus fusion glycoprotein in different cell types molecular characterization and complete genome sequence of avian paramyxovirus type prototype strain duck/hong kong/d / complete sequence of the genome of avian paramyxovirus type and comparison with other paramyxoviruses mutations in the fusion protein cleavage site of avian paramyxovirus serotype increase cleavability and syncytium formation but do not increase viral virulence in chickens mutation of the f-protein cleavage site of avian paramyxovirus type results in furin cleavage, fusion promotion, and increased replication in vitro but not increased replication, tissue tropism, or virulence in chickens mutations in the fusion protein cleavage site of avian paramyxovirus serotype confer increased replication and syncytium formation in vitro but not increased replication and pathogenicity in chickens and ducks effect of fusion protein cleavage site sequence on generation of a genotype vii newcastle disease virus vaccine cleavage of the human respiratory syncytial virus fusion protein at two distinct sites is required for activation of membrane fusion host cell entry of respiratory syncytial virus involves macropinocytosis followed by proteolytic activation of the f protein ubiquitous activation of the nipah virus fusion protein does not require a basic amino acid at the cleavage site cathepsin l is involved in proteolytic processing of the hendra virus fusion protein recombinant sendai viruses expressing fusion proteins with two furin cleavage sites mimic the syncytial and receptor-independent infection properties of respiratory syncytial virus influenza virus activating host proteases: identification, localization and inhibitors as potential therapeutics role of host cellular proteases in the pathogenesis of influenza and influenza-induced multiple organ failure proteolytic activation of the influenza virus hemagglutinin proteolytic activation of influenza viruses by serine proteases tmprss and hat from human airway epithelium hat, and tmprss activate the hemagglutinin of h n influenza a viruses a novel activation mechanism of avian influenza virus h n by furin influenza virus hemagglutinin with multibasic cleavage site is activated by furin, a subtilisin-like endoprotease proprotein-processing endoproteases pc and furin both activate hemagglutinin of virulent avian influenza viruses sequence specificity of furin, a proprotein-processing endoprotease, for the hemagglutinin of a virulent avian influenza virus activation of an influenza virus a/turkey/oregon/ ha insertion variant by the subtilisin-like endoprotease furin targeted infection of endothelial cells by avian influenza virus a/fpv/rostock/ (h n ) in chicken embryos molecular basis for high virulence of hong kong h n influenza a viruses novel type ii transmembrane serine proteases, mspl and tmprss , proteolytically activate membrane fusion activity of the hemagglutinin of highly pathogenic avian influenza viruses and induce their multicycle replication furin inhibitors: importance of the positive formal charge and beyond a small-molecule furin inhibitor inhibits cancer cell motility and invasiveness inhibition of furin/proprotein convertase-catalyzed surface and intracellular processing by small molecules synthetic small molecule furin inhibitors derived from , -dideoxystreptamine identification of potent and compartment-selective small molecule furin inhibitors using cell-based assays potent inhibitors of furin and furin-like proprotein convertases containing decarboxylated p arginine mimetics the multi-leu peptide inhibitor discriminates between pace and furin and exhibits antiproliferative effects on prostate cancer cells targeting host cell furin proprotein convertases as a therapeutic strategy against bacterial toxins and viral pathogens a novel enediynyl peptide inhibitor of furin that blocks processing of propdgf-a selective and potent furin inhibitors protect cells from anthrax without significant toxicity highly potent inhibitors of proprotein convertase furin as potential drugs for treatment of infectious diseases optimization of furin inhibitors to protect against the activation of influenza hemagglutinin h and shiga toxin design, synthesis, and structure−activity relationship studies of a potent pace inhibitor peptidomimetic furin inhibitor mi- in combination with oseltamivir and ribavirin efficiently blocks propagation of highly pathogenic avian influenza viruses and delays high level oseltamivir resistance in mdck cells novel furin inhibitors with potent anti-infectious activity optimization of substrate-analogue furin inhibitors elongated and shortened peptidomimetic inhibitors of the proprotein convertase furin effects of ns b-ns protease and furin inhibition on west nile and dengue virus replication design, synthesis, and characterization of macrocyclic inhibitors of the proprotein convertase furin polyarginines are potent furin inhibitors short polybasic peptide sequences are potent inhibitors of pc / and pc : use of positional scanning-synthetic peptide combinatorial libraries as a tool for the optimization of inhibitory sequences lindberg, i. cationic cell-penetrating peptides are potent furin inhibitors engineered eglin c variants inhibit yeast and human proprotein processing proteases, kex and furin arg -lys -arg turkey ovomucoid third domain inhibits human furin inhibition of intracellular proteolytic processing of soluble proproteins by an engineered α -macroglobulin containing a furin recognition sequence in the bait region polyarginine inhibits gp processing by furin and suppresses productive human immunodeficiency virus type infection blockage of filoviral glycoprotein processing by use of a protein-based inhibitor a protein-based therapeutic for human cytomegalovirus infection key: cord- -vjlisnui authors: driscoll, amanda j.; karron, ruth a.; morpeth, susan c.; bhat, niranjan; levine, orin s.; baggett, henry c.; brooks, w. abdullah; feikin, daniel r.; hammitt, laura l.; howie, stephen r. c.; knoll, maria deloria; kotloff, karen l.; madhi, shabir a.; scott, j. anthony g.; thea, donald m.; adrian, peter v.; ahmed, dilruba; alam, muntasir; anderson, trevor p.; antonio, martin; baillie, vicky l.; dione, michel; endtz, hubert p.; gitahi, caroline; karani, angela; kwenda, geoffrey; maiga, abdoul aziz; mcclellan, jessica; mitchell, joanne l.; morailane, palesa; mugo, daisy; mwaba, john; mwansa, james; mwarumba, salim; nyongesa, sammy; panchalingam, sandra; rahman, mustafizur; sawatwong, pongpun; tamboura, boubou; toure, aliou; whistler, toni; o’brien, katherine l.; murdoch, david r. title: standardization of laboratory methods for the perch study date: - - journal: clin infect dis doi: . /cid/cix sha: doc_id: cord_uid: vjlisnui the pneumonia etiology research for child health study was conducted across diverse research sites and relied on standardized clinical and laboratory methods for the accurate and meaningful interpretation of pneumonia etiology data. blood, respiratory specimens, and urine were collected from children aged – months hospitalized with severe or very severe pneumonia and community controls of the same age without severe pneumonia and were tested with an extensive array of laboratory diagnostic tests. a standardized testing algorithm and standard operating procedures were applied across all study sites. site laboratories received uniform training, equipment, and reagents for core testing methods. standardization was further assured by routine teleconferences, in-person meetings, site monitoring visits, and internal and external quality assurance testing. targeted confirmatory testing and testing by specialized assays were done at a central reference laboratory. the primary aim of the pneumonia etiology research for child health (perch) study was to provide a contemporary picture of the microbial etiology of severe pneumonia in young children from developing countries [ ] . one of the defining characteristics of the study was the use of a standard case definition and a rigorous training program to achieve standardization of case assessments and specimen collection [ ] . the standardization of laboratory methods in the perch study was equally important to ensure comparability across study sites and for accurate and meaningful interpretation of pneumonia etiology results. we have previously described the process leading to the perch diagnostic testing strategy [ ] [ ] [ ] [ ] . here we describe the laboratory methods used in perch and the procedures to ensure standardization and quality. sites. to build capacity at the sites, and in alignment with the priorities of the bill & melinda gates foundation, all perch testing was done locally, with the exception of quality assurance testing and a select subset of specialized assays, which were performed at the study reference laboratory (canterbury health laboratories, christchurch, new zealand), which also served as the study specimen and isolate biorepository. the perch site laboratories were located in kilifi, kenya; basse and banjul, the gambia; bamako, mali; lusaka, zambia; soweto, south africa; nakhon phanom, sa kaeo, and nonthaburi, thailand; and matlab and dhaka, bangladesh. study sites were selected through an open, global site solicitation and selection process. as there was little support in the budget to expand existing infrastructure, preference was given to sites with well-established research laboratories and with experience in pediatric pneumonia studies. blood, nasopharyngeal and oropharyngeal (np/op) swabs, induced sputum (is), and urine were collected from all perch cases at enrollment. cases who were intubated had endotracheal aspirate collected in lieu of induced sputum. gastric aspirate and pleural fluid specimens were collected when clinically indicated. at select sites, lung aspirates (bangladesh, the gambia, mali, south africa) and postmortem specimens (thailand, south africa) were also collected. among perch controls, blood, np/op swabs, and urine were collected at enrollment at all sites. standardized recommendations for specimen storage prior to laboratory evaluation were provided in standard operating procedures (sops) ( table ). all specimens were collected within walking distance or within a -hour drive of the study laboratory, with the exception of matlab, bangladesh, where specimens were transported once or twice a day from the field hospital to the laboratory in dhaka, and basse, the gambia, where specimens were transported to times weekly to the fajara laboratory for molecular testing. the transportation of all specimens was done under controlled temperature conditions, as stipulated in the study sop. standardized testing algorithms were developed for each body fluid type among cases and controls ( figure ). following testing, residual volumes of body fluid specimens were stored at - °c to facilitate future research, with particular attention to bioethical considerations [ ] . bacterial isolates cultured in pure growth were stored at - °c unless classified as contaminants. barcode labels were used on specimen collection containers, data collection forms, and laboratory requisition forms [ ] . in the laboratory, specimen aliquots and isolates were archived in -ml sterile cryovials with silicone o-rings, labeled with thermostable labels, and inventoried using freezer management software [ ] . laboratory data were entered into an electronic data capture system to allow for real-time study monitoring [ ] . to standardize molecular testing, an automated nucleic acid extraction platform and standardized polymerase chain reaction (pcr) assays, described below, were deployed across all study sites. the perch laboratory director reviewed all pcr results files prior to their inclusion in the study database. for pcr evaluation of respiratory specimens, we used the fasttrack diagnostics respiratory pathogens multiplex pcr kit (ftd resp- kit) (fast-track diagnostics, sliema, malta). np/ op specimens were collected in viral transport medium (universal transport medium [utm], copan diagnostics, bresica, italy) and refrigerated at °c- °c for a maximum of hours, or frozen at - °c prior to nucleic acid extraction. induced sputum, pleural fluid, and lung aspirate specimens were collected in saline in universal containers and either refrigerated at °c- °c for a maximum of hours, or frozen at - °c prior to nucleic acid extraction. total nucleic acid extraction was performed on respiratory specimens using the nuclisens easymag platform (biomérieux, marcy l'etoile, france). four hundred microliters of each respiratory specimen (np specimen in utm, induced sputum aliquot in normal saline, pleural fluid aliquot, or lung aspirate aliquot) was eluted to a final volume of - μl nucleic acid. prior to extraction, induced sputum specimens were digested with : dithiothreitol and incubated at ambient temperature until any mucus was broken down. the ftd resp- kit is a real-time pcr arranged in multiplex groups for the detection of the following viruses, bacteria, and fungi: influenza a, b, and c; parainfluenza viruses , , , and ; coronaviruses nl , e and haemophilus influenzae species. the k. pneumoniae target was not used in any of the final analyses because of difficulties with assay specificity, as has been found elsewhere [ ] . positive, negative, and internal extraction controls were included in each run. quantitative pcr (qpcr) data were generated through the creation of standard curves using -fold serial dilutions of plasmid standards provided by ftd on an approximately quarterly basis at each study site, with calculation of pathogen density (copies/milliliter) from the sample cycle threshold (ct) values. because the results for the known standards were highly consistent across laboratories, standard curve data from all sites were pooled to create "standardized" standard curves for each pathogen target; data points beyond standard deviations of the mean were excluded. quantitative pcr was performed at each site using an applied biosystems (abi- ) platform (applied biosystems, foster city, california). cycling conditions were °c for minutes, °c for minutes, and cycles of °c for seconds followed by °c for seconds. whole blood samples were collected into a dedicated edta (ethylenediaminetetraacetic acid) tube and either refrigerated at °c- °c for a maximum of days, or frozen at - °c prior to nucleic acid extraction. total nucleic acid extraction was performed in batches with μl of whole blood extracted and eluted to a final volume of μl nucleic acid using the nuclisens "specific b" protocol. extracted dna was frozen at - °c until undergoing pcr for detection of the autolysin (lyta) gene. streptococcus pneumoniae nucleic acid was detected in whole blood using a qpcr assay based on a method from the us centers for disease control and prevention [ ] . mastermix containing . μl of gene expression mastermix (applied biosystems, life technologies, california), . μl of each of the μm forward and reverse primers and probe, μl of molecular-grade water, and μl of template dna was used per reaction. quantification standards consisting of lyta plasmids (fast-track diagnostics, sliema, malta) diluted : from copies/ml to copies/ml were run in triplicate on every plate. a no-template control, consisting of molecular-grade water, was likewise run in triplicate. cycling conditions of °c for minutes followed by cycles of °c for seconds and °c for minute were applied on an abi- instrument. exponential amplification curves with a ct value of < cycles were considered positive and quantified using the standard curve. all samples positive for bordetella pertussis were tested for bordetella holmesii [ ] . a high proportion of samples positive for h. influenzae type b, especially from countries with hib vaccine, called into question the specificity of that pathogen target. consequently, we retested these positive samples with an established hib assay [ ] and used the results from this second assay in our analyses. due to intermittent contamination of the nucleic acid extraction lysis buffer with legionella species, samples positive for legionella were retested at the reference laboratory [ ] . because few whole blood samples from thailand and bangladesh were positive for lyta pcr, we retested a random sample of documented pneumococcal carriers (np/op specimens positive for s. pneumoniae) from both sites using the perch lyta pcr assay at the reference laboratory. no false negative blood results were found. np/op swabs from all cases with clinical signs or history of measles were tested for measles virus by pcr using the nucleoprotein gene target [ ] at the reference laboratory. suspected measles was defined as a history of measles in the past months, measles rash at admission, or measles diagnosis at admission or discharge. organism identification was done according to standard microbiological methods that were documented in sops and clarified at each site at the outset; antimicrobial susceptibility testing followed the clinical and laboratory standards institute (clsi) guidelines [ ] . antimicrobial susceptibility was tested using overnight growth of pure isolates, using the disk diffusion methodology when possible. streptococcus pneumoniae isolates that had reduced susceptibility to penicillin by the oxacillin screen were tested by etest (biomérieux, low-dose strips) or a commercially available broth mic method (trek diagnostic systems) to measure minimum inhibitory concentrations (mic) to penicillin. enterobacteriaceae were screened for extended spectrum β-lactamase production using a cefotaxime ( µg) disk and a ceftazidime ( µg) disk. zone sizes ≤ mm for cefotaxime or ≤ mm for ceftazidime were confirmed by the double disk diffusion test, following the clsi guidelines. organism identification was confirmed for each specimen type and is described below. antibiotic susceptibility testing results were confirmed in a sample of % of isolates from all sites with % concordance of results. blood cultures were incubated per manufacturer instructions using automated systems (bactec [becton dickinson, sparks, maryland] in kenya, south africa, the gambia, mali, and zambia; bact/alert [biomérieux, marcy l'etoile, france] in bangladesh and thailand). culture bottles were incubated within hours of specimen collection. alarm-positive culture specimens were plated on to % sheep or horse blood, chocolate, and macconkey agar. specimens were incubated for days and then discarded. specimens that were alarm-positive but subculture negative were tested using the binaxnow streptococcus pneumoniae antigen card (alere, scarborough, maine) if the gram stain was either negative or revealed gram-positive cocci. organisms were identified and antimicrobial susceptibility testing was performed according to clsi methods. any isolate classified as a contaminant was not stored; all other organisms were identified then stored at - °c. organisms were defined, a priori, as contaminants (table ) . of stored blood culture isolates, ( %) were shipped for confirmatory identification at the reference laboratory; of these, ( %) were concordant with the original result. in the absence of strong evidence otherwise, organisms meeting the a priori contaminant definition were considered as contaminants. candida species was also considered a blood culture contaminant if the patient recovered without antifungal treatment. np swabs in skim milk tryptone-glucose-glycerin (stgg) medium were frozen at - ˚c overnight, then thawed and processed using a broth-enrichment step to enhance pneumococcal carriage recovery [ , ] . streptococcus pneumoniae was identified by colony morphology, susceptibility to optochin, and bile solubility testing. samples were inoculated onto % sheep or horse blood agar with µg of gentamicin per milliliter and incubated at °c ± °c for - hours. following subculture, each morphologically distinct pneumococcal colony was isolated and stored, with up to a maximum of isolates per plate. efforts were made to process sputum specimens within , and no more than , hours following collection. gram-stained smears were made from the most purulent portion of each induced sputum specimen. the number of epithelial cells and neutrophils per low-powered microscopic field were counted and recorded for the purpose of assessing specimen quality [ ] . microorganisms seen in the smear were described according to classic gram stain morphotypes, with the number of bacterial morphotypes seen per high-powered field recorded to assist in interpretation of culture results. the most purulent portion of each specimen was inoculated onto sheep or horse blood, chocolate, and macconkey agars, streaked out using the -quadrant streaking method, and incubated at °c for hours. cultures were examined at hours and hours, and predominant organisms were identified and quantified according to the furthest quadrant with visible colonies (first quadrant = scanty; second quadrant = +; third quadrant = +; fourth quadrant = +). background mixed oropharyngeal flora, including α-hemolytic streptococci, commensal neisseria, coagulase-negative staphylococci, yeasts (except cryptococcus), diphtheroids, and capnocytophaga were quantified as a group but not identified further. induced sputum specimens were also cultured for mycobacteria by standard liquid culture methods. gram stains were performed on all pleural fluid and lung aspirate specimens, and the number of leukocytes per low-powered field and bacterial morphotypes per high-powered field was recorded. each specimen was cultured by plating onto chocolate and macconkey agar and also inoculated in appropriate broth (blood culture bottles, tryptone soy broth, brain heart infusion, and brucella broth) and overnight incubation at °c- °c. all plated and broth growth was examined at hours and identified according to standard microbiological methods. pleural fluid supernatant was assayed for protein and glucose and tested using the binaxnow streptococcus pneumoniae antigen card. pleural fluid and lung aspirate specimens were cultured for the presence of mycobacteria in liquid culture. of available pleural fluid and lung aspirate isolates, ( %) were shipped to the reference laboratory for confirmation and all had their original organism identification confirmed. pneumococcal capsular serotyping was performed by the following methods: quellung reaction (zambia, south africa), pcr deduction of pneumococcal serotypes [ ] followed by quellung reaction if there were mixed or ambiguous results by pcr (thailand, mali, the gambia, bangladesh), or latex agglutination at pool level with quellung reaction for final typing and pcr confirmation of a subset of isolates as a quality control procedure (kenya). mixed or ambiguous results that could not be resolved at the study sites were serotyped by quellung reaction at a reference laboratory (national institute for communicable diseases, johannesburg, south africa or the institute of environmental science and research [esr], porirua, new zealand). serotyping for all pneumococcal isolates isolated from sterile sites as well as a sample of - pneumococcal isolates from the np swab culture was verified by quellung at the esr laboratory. haemophilus influenzae were identified at all sites using standard microbiological methods; serotype b was identified by slide agglutination. additional (non type b) serotyping was performed in south africa and the gambia by slide agglutination. for all other sites, serotyping beyond type b was done by pcr at the reference laboratory (canterbury health laboratories, christchurch, new zealand) [ ] . a bioassay was performed on enrollment serum samples, from all cases and controls, to detect antibiotic activity. a -mm filter paper disc was inoculated with µl of serum and placed on a mueller-hinton plate seeded with a . mcfarland suspension of a fully sensitive staphylococcus aureus strain (atcc ). any zone of inhibited bacterial growth around the disc after - hours' incubation was recorded as evidence of serum antibiotic activity. serum from all cases was assayed for c-reactive protein (crp). samples from south africa were assayed in country using crp gen immunoturbidometric assay (roche diagnostics, milan, italy). all other samples were assayed at the reference laboratory in christchurch, new zealand, using crp vario immunoturbidometric assay (roche diagnostics). a subset of control samples was assayed for crp as part of an analysis to evaluate its diagnostic utility [ ] . a complete blood count was performed on all cases. hemoglobin testing for controls was also carried out in the gambia, mali, and south africa. thalassemia testing for cases and controls was done in thailand. culture of induced sputum and gastric aspirate specimens for mycobacteria was performed using liquid media in established tuberculosis testing laboratories at all sites. antimicrobial susceptibility testing was performed on all mycobacterium tuberculosis isolates. isolates of nontuberculous mycobacteria were identified at the reference laboratory by s ribosomal rna and rpob sequencing [ ] [ ] [ ] . malaria testing was performed by rapid antigen test or microscopy for all cases at sites with endemic malaria (kenya, the gambia, mali, zambia) and in south africa when clinically indicated. human immunodeficiency virus (hiv) testing was done on all cases at all sites apart from bangladesh, and for controls at all africa sites with the exception of the gambia (which has an hiv infection prevalence of < %). testing was done by serum antibody assay, followed by pcr confirmatory antigen testing for cases and controls < months of age. in zambia and south africa, cd assessments were performed or collected from referral facilities for all hiv-infected cases and controls. all respiratory specimens were tested for p. jirovecii nucleic acid by fast-track pcr. additionally, induced sputum, endotracheal aspirate, pleural fluid, and lung aspirate specimens were tested for p. jirovecii by immunofluorescence (south africa) and toluidine blue staining (zambia). laboratory sops were developed in collaboration with site investigators for all core laboratory procedures. all sites underwent a period of training and pilot testing prior to study initiation. following demonstration of successful performance, a site activation letter allowed formal study enrollment to commence [ ] . the perch laboratory director, in conjunction with other team members, visited and evaluated each study laboratory prior to study piloting and provided advice on areas for additional improvement of facilities or training of staff. major equipment, including the nucleic extraction platform and pcr thermocyclers, was procured centrally and installed at each site laboratory. in addition, maintenance contracts were provided for the period of the study. training on the nucleic acid extraction system was provided by biomérieux at installation. fast-track pcr training was provided in-person by fast-track diagnostics over a period of days at each site. trained staff received a certificate of completion and the site laboratory was required to pass an external quality assurance assessment before beginning molecular diagnostic testing on study samples. sites were trained on induced sputum slide reading at an initial training and again at a midstudy refresher training. a subset of approximately % of slides from each site were stored following reading and later sent to the reference laboratory where they were audited. sites that were unable to send slides had a random sample checked by the laboratory director during study oversight visits. key sops were reviewed with laboratory scientists from all sites at an investigator meeting prior to study initiation, training on the data capture system and freezer inventory software was provided remotely via a webinar prior to the start of the study. a working group including the laboratory director and representatives from all site laboratories was convened throughout the study to harmonize practices and troubleshoot problems at periodic investigator meetings and through regular teleconferences. a midstudy in-person refresher training on core laboratory procedures was conducted for all sites in august . the laboratory director visited each study site at least twice over the course of the study to provide on-site monitoring. laboratory quality indicators were monitored using the real-time data entry system and were used to identify areas for improvement over the duration the study [ ] . electronic laboratory data reports were generated from the database and reviewed at regular intervals by the laboratory director. in addition, digital pcr results files were rechecked at the reference laboratory to confirm accurate interpretation of pcr quantification curves. discordant interpretations of results were discussed with the laboratories and corrected in the database. an external quality assessment (eqa) program was set up by fast-track diagnostics to monitor performance of the fasttrack respiratory pcr and whole blood lyta pcr assays at each site. for the fast-track respiratory eqa, laboratories were supplied with a series of samples containing mixtures of plasmids at various concentrations at -to -month intervals. each laboratory was required to test the samples using its routine ftd resp- assay and standard procedures. panels for the whole blood lyta pcr eqa were dispatched from ftd at the same frequency and included blinded plasmid samples containing the lyta target in a range of concentrations. for each round of eqa, an individual performance report was provided along with details of overall performance for all sites. reports and practical feedback allowed participants to identify and resolve potential problems whilst monitoring the effectiveness of their laboratory quality assurance processes. site-to-site variation was also assessed using these reports. all sites were enrolled in an eqa program for the microbiological assessment of respiratory specimens, organized by the royal college of pathologists of australasia quality assurance programme. eqa panels were dispatched on a quarterly basis and consisted of simulated clinical specimens for the isolation of pathogens, bacterial identification and antimicrobial susceptibility testing. most module shipments consisted of specimens containing either a pure culture or a mixture simulating a clinical specimen with normal body flora. following each dispatch, results were reviewed by the laboratory director and discussed with the site laboratories. we faced challenges in applying such a high level of standardization to a complex study across diverse research sites. sites varied considerably in their prior level of experience with the perch laboratory methods and therefore required varying levels of assistance and oversight. assuring that each laboratory had standard equipment in place and was comprehensively trained meant that the initiation of study enrollment and full specimen testing was delayed by weeks to months in some instances. maintaining a high level of involvement and in-person oversight required regular travel by the laboratory director in addition to frequent communication by phone and email. additionally, local approvals to ship specimens and isolates for confirmatory testing resulted in long delays in the reference laboratory receiving samples from some sites. despite these challenges, we were able to achieve the highest methodological standards across a variety of circumstances, and demonstrated the ability to set up very complex molecular diagnostics in challenging environments. achieving standards was a very positive motivator among laboratory staff, especially in laboratories that had not used international standards before. the value of regular feedback to the staff was evident in our study and we observed laboratory capacity and technical skills improve rapidly over a short period of time, with some of the laboratories without prior similar experience becoming the highest performers. applying a high level of standardization required considerable effort in the study planning stage and throughout the enrollment and testing period, but in the end this effort was outweighed by our confidence that results were accurate and comparable across sites. perch was one of the largest pneumonia etiology studies ever undertaken, with a complex testing algorithm applied to > individuals distributed over enrollment centers in different countries. considerable efforts were made to perform as much of the laboratory testing at the study sites as possible, and to ensure cross-site standardization of testing methods. as well as providing confidence in the perch analyses, our experiences provide evidence that multisite studies involving extensive laboratory assessments and including complex molecular diagnostics can be undertaken at research sites in a variety of settings and circumstances, including those with limited prior experience. supplementary materials are available at clinical infectious diseases online. consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author. author contributions. a. j. d. and d. r. m. drafted the manuscript. d. r. m. supervised all laboratory activities for the study. all authors were involved in the development of standard operating procedures, study conduct, data collection, and/or data management. all authors reviewed and approved the manuscript. a. j. d. and d. r. m. had final responsibility for the decision to submit for publication. pneumonia etiology research for child health. introduction standardization of clinical assessment and sample collection across all perch study sites bhat n; pneumonia methods working group; perch core team. laboratory methods for determining pneumonia etiology in children use and evaluation of molecular diagnostics for pneumonia etiology studies disk diffusion bioassays for the detection of antibiotic activity in body fluids: applications for the pneumonia etiology research for child health project evaluation of fast-track diagnostics and taqman array card real-time pcr assays for the detection of respiratory pathogens bioethical considerations in developing a biorepository for the pneumonia etiology research for child health project data management and data quality in perch, a large international case-control study of severe childhood pneumonia aetiology of childhood pneumonia in a well vaccinated south african birth cohort: a nested case-control study of the drakenstein child health study evaluation and improvement of real-time pcr assays targeting lyta, ply, and psaa genes for detection of pneumococcal dna pneumonia etiology research for child health (perch) study group. pertussis-associated pneumonia in infants and children from low-and middle-income countries participating in the perch study detection of haemophilus influenzae type b by real-time pcr direct detection and differentiation of legionella spp. and legionella pneumophila in clinical specimens by dual-color real-time pcr and melting curve analysis development of quantitative gene-specific real-time rt-pcr assays for the detection of measles virus in clinical specimens performance standards for antimicrobial susceptibility testing; twenty-first informational supplement revisiting pneumococcal carriage by use of broth enrichment and pcr techniques for enhanced detection of carriage and serotypes evaluation of a medium (stgg) for transport and optimal recovery of streptococcus pneumoniae from nasopharyngeal secretions collected during field studies microscopic analysis and quality assessment of induced sputum from children with pneumonia in the perch study molecular detection methods and serotyping performed directly on clinical samples improve diagnostic sensitivity and reveal increased incidence of invasive disease by streptococcus pneumoniae in italian children real-time pcr for determining capsular serotypes of haemophilus influenzae association of c-reactive protein with bacterial and respiratory syncytial virus-associated pneumonia among children aged < years in the perch study detection and identification of mycobacteria by amplification of rrna genotypic identification of mycobacteria by nucleic acid sequence determination: report of a -year experience in a clinical laboratory rpob-based identification of nonpigmented and late-pigmenting rapidly growing mycobacteria acknowledgments. we offer sincere thanks to the patients and families who participated in this study. we acknowledge the significant contributions of the perch study group and all perch investigators. we offer our gratitude to the members of the pneumonia methods working group and perch expert group for their time and lending expertise to assist the perch study group. see supplementary materials for a full list of names. key: cord- -xpj c vx authors: piñón, josefina d.; teng, henry; weiss, susan r. title: further requirements for cleavage by the murine coronavirus c-like proteinase: identification of a cleavage site within orf b date: - - journal: virology doi: . /viro. . sha: doc_id: cord_uid: xpj c vx abstract the coronavirus mouse hepatitis virus strain a (mhv-a ) encodes a c-like proteinase ( clpro) that is proposed to be responsible for the majority of the processing events that take place within the replicase polyproteins pp a and pp ab. in this study we demonstrate that the q ↓s peptide bond, located between the polymerase and zn-finger regions of pp ab (the pol↓zn site), is processed by the clpro, albeit inefficiently. mutagenesis of the pol↓zn site, as well as the previously identified hd ↓ c site in the a region of pp a and pp ab, demonstrated that the amino acid residues at the p and p positions of the cleavage site, occupied by l and q, respectively, were important determinants of clpro substrate specificity. finally, a direct comparison of the clpro-mediated cleavages at the hd ↓ c and pol↓zn sites was made by determining the rate constants using synthetic peptides. the results show that while a larger polypeptide substrate carrying the hd ↓ c site was processed more efficiently than a polypeptide substrate carrying the pol↓zn site, cleavage of the synthetic peptide substrates containing these two cleavage sites occurred at similar efficiencies. this indicates that the overall conformation of a large polyprotein substrate is important in the accessibility of the cleavage site to the proteinase. the family coronaviridae is composed of a group of viruses that cause a variety of diseases in different animal hosts. the murine coronavirus, mouse hepatitis virus (mhv), causes a range of diseases in mouse, including enteritis, hepatitis, encephalitis, and a demyelinating disease (holmes and lai, ; houtman and fleming, ; lai, ) . coronaviruses, along with the arteriviruses, are classified under the newly established order nidovirales based on the similarities in their genome organization and replication strategy (cavanagh, ; de vries et al., ) . the name nidovirales originates from the latin word nidus, meaning "nest," and refers to the Ј nested set of subgenomic mrnas that is produced during viral infection (de vries et al., ) . as with all positive-strand rna viruses, entry of the viral genome into the cytoplasm is followed by the translation of the positive-strand rna genome, resulting in the expression of viral proteins. the coronavirus genome is organized into seven genes that are separated by stop codons and intergenic sequences (holmes and lai, ; lai, ) . thus, translation of the viral genome results only in the expression of gene proteins. the rest of the viral genome is expressed through subgenomic mrnas that are transcribed by the viral rna-dependent rna polymerase encoded in gene . replication of the viral genome also requires the replicase proteins encoded in gene . thus, for both viral replication and subgenomic mrna transcription to take place, the expression of gene products is essential. the replicase gene (gene ) ( fig. ) of coronaviruses, spanning - kb, is organized into two overlapping open reading frames, orf a and orf b (bonilla et al., ; lee et al., ) . the expression of the downstream orf b is mediated by a ribosomal frameshift event that is aided by the formation of a pseudoknot structure within the overlapping region (bredenbeek et al., ; brierley et al., ; . thus, two polypeptides, pp a and pp ab, are expressed from gene , with the translation of pp ab being only - % as efficient as that of pp a in in vitro studies (bredenbeek et al., ; brierley et al., ) . through a series of intricate cotranslational and posttranslational processing events, these polyproteins are converted into a functional complex that in turn is responsible for both genomic rna replication and subgenomic mrna transcription (de vries et al., ) . responsible for these processing events are at least two or three viral proteinases encoded within the orf a region of gene (fig. ) . two of these proteinase domains, by sequence analysis, share similarities with the cellular proteinase papain. a third proteinase, resembling the poliovirus c proteinase, has also been identified (gorbalenya et al., ; lee et al., ) . the coronavirus c-like proteinase ( clpro), flanked on either side by hydrophobic, possibly membrane-spanning regions (hd and hd ), is believed to be the prinicipal viral proteinase responsible for the processing events leading to the formation of the viral replicase complex, with as many as potential cleavage sites identified throughout pp ab (gorbalenya et al., ; lee et al., ) (see fig. ). the presence of the clpro is conserved in all coronavirus genomes studied to date (bonilla et al., ; boursnell et al., ; eleouet et al., ; lee et al., ) . the clpro of mhv-a has been identified as a -kda protein (p ) both in in vitro study and in mhv-a infected cells (piñó n et al., ) . reported a molecular weight of kda for the same polypeptide.) the catalytic residues of the mhv-a clpro, his , and cys have also been identified . treatment of infected cells with e- d, a known inhibitor of the clpro, results in the inhibition of viral rna replication in these cells (kim et al., ) , demonstrating the importance of the action of the clpro in the events leading to viral replication. van dinten et al. ( ) demonstrated the importance of clpro cleavages using an infectious clone of the related arterivirus eav; introduction of mutations into the candidate orf b clpro cleavage sites had drastic effects on rna synthesis and virus replication. these fndings indicate that this proteinase is a good potential target for antiviral therapy. the cleavage sites of the coronavirus clpro ( fig. ) conform to the consensus xq z (arrow indicates site of cleavage), with x being a hydrophobic residue, usually l, although the amino acids i, m, v, and f are also found in this position (de vries et al., ) . at the p Ј position, z is usually a small uncharged residue such as s, a, g, or c (de vries et al., ) , with s being the most common residue at this position. data recently obtained for the avian infectious bronchitis virus (ibv), the human coronavirus (hcv- e), and mhv-a demonstrated that many of these predicted cleavage sites are functional. for all three strains, many of the cleavage sites in pp a have been identified, including a noncanonical lq n site that had not previously been predicted liu and brown, ; liu et al., ; ng and liu, ; piñó n et al., ; tibbles et al., ; ziebuhr et al., ; ziebuhr and siddell, ) . in addition, several of the processing sites in orf b have also been identified in both ibv and hcv- e (grötzinger et al., ; heusipp et al., a,b; liu et al., liu et al., , . however, for mhv, there has been greater difficulty in demonstrating processing by the clpro at any of the predicted orf b cleavage sites. in this study we demonstrate processing by the clpro at the putative orf b the following domains are shown: papain-like proteinase (plp- and plp- ), x domain (x), poliovirus c-like proteinase ( clpro), hydrophobic domains (hd and hd ), growth factor-like domain (gfl), rna-dependent rna polymerase (pol), zinc-finger domain (zn), and helicase (hel). the predicted clpro cleavage sites are indicated by the numbers and the cleavage site sequences from the p to the p Ј position are listed in table form (arrow indicates site of cleavage) (bonilla et al., ; bredenbeek et al., ; gorbalenya et al., ; lee et al., ). cleavage site situated between the putative polymerase (pol) and zinc finger (zn) domains (the pol zn site) (fig. ) . furthermore, identification of this site allowed us to make comparisons between the efficiencies of processing at this orf b site and a previously identified site in orf a located at the junction between hydrophobic domain (hd ) and the clpro (the hd c site). we have previously demonstrated processing at an orf a site in mhv-a , located at the junction between hd and the clpro (hd c), by a recombinant mhv-a clpro expressed as a fusion protein with the maltose binding protein (mbp) (piñó n et al., ) . the plasmid pet -nx. c c a, encoding the carboxy-terminal amino acids of hd and the inactivated clpro, is in vitro transcribed and translated to yield a -kda substrate nx. c (fig. a, lane ) . as previously demonstrated (piñó n et al., ), upon addition of the recombinant mbp- clpro, this substrate is efficiently processed into the -kda proteinase ( c) and the -kda hd -derived product (nx) (lane ). other studies using similar methods have led to the identification of orf a cleavage sites downstream of the clpro, including a previously unidentified, noncanonical lq n site . however, the demonstration of processing at orf b sites has proven to be difficult. our work with papain-like proteinase (plp- ) showed that substrate length, and possibly substrate conformation, played an important role in the ability of a substrate to be cleaved efficiently by the proteinase (teng et al., ) . we therefore created several substrates of various lengths, encoding different putative cleavage sites in orf b, in order to investigate processing by the recombinant mbp- clpro. of these, only the substrate expressed from pet -pol.zn, encoding mhv-a orf b amino acids l -q , proved to be useful in our investigations. in vitro transcription-translation of the construct pet -pol.zn resulted in the expression of a -kda full-length substrate, pol.zn (predicted molecular weight kda) (fig. b , lane ). upon incubation of this substrate with the recombinant mbp- clpro enzyme, cleavage between q and s , would give rise to two products, an n-terminal product with a predicted molecular weight (bonilla et al., ; bredenbeek et al., ; lee et al., ) . the positions of the catalytic residues, his and cys , are shown. pet -ha-hd . c encodes mhv-a amino acids from s to g directly downstream of the influenza hemagglutinin (ha) tag under the control of the t promoter (Á). pet -nx. c encodes mhv-a amino acids from k to q . pmal- c.wt encodes mhv-a amino acid sequences from s to g fused to the mal e gene encoding the maltose-binding protein (mbp). the construct pmal- c is used for the overexpression of the clpro in escherichia coli. (c) enlarged map of pol and zn region of orf b and schematic representation of plasmids derived from this region. the position of the q s cleavage site at the junction between pol and zn is shown. pet -pol.zn encodes mhv-a orf b amino acids from l to q under the control of the t promoter. of kda (pol) and a c-terminal product with a predicted molecular weight of kda (zn). in fig. b , lane , the addition of the recombinant mbp- clpro to the pol.zn substrate resulted in the production of a -kda protein which presumably corresponds to the n-terminal processing product. mutagenesis of q to k (lane ) or r (lane ) abolished this processing, suggesting that the cleavage is occurring at the predicted site and that, in view of the substrate specificity demonstrated, the observed processing event is in fact due to the action of the recombinant mbp- clpro. we were unable to detect the c-terminal -kda product by sds-page analysis. one explanation could be that p cannot be resolved from p in our gel system. (there have been previous reports of viral proteins migrating with electrophoretic mobilities different from that expected. the mhv-a clpro (p ) itself migrates with an electrophoretic mobility faster than its predicted molecular weight of kda (piñó n et al., ; lu et al., ) ). in addition, the predicted cleavage product p has approximately half the methionine content of p , which may contribute to the difficulty in its detection. the identification of these two cleavage sites, one in orf a (hd c) and one in orf b (pol zn), allowed us to further define the amino acids required for efficient processing by the clpro. for these experiments, we chose to use the pet -ha.hd . c construct, rather than other plasmids encoding clpro, because the ha.hd . c polypeptide can be efficiently cleaved both in cis and in trans and does not require membranes for its cleavage (piñó n et al., ) . several sets of mutations, from the p to the p Ј position, were introduced into the construct pet -ha.hd . c by pcr mutagenesis using the mutagenesis primers outlined in table . the effect of these cleavage site mutations on the autocatalytic cis release of the -kda clpro was assayed by the expression of the radiolabeled, in vitro transcribed, and translated substrate from pet -nx. c was incubated with mbp- clpro (lane ) or an equal volume of column buffer/ % glycerol (lane ) and the processed products were separated on a % sds-page gel. the arrows on the right of the panel indicate the electrophoretic migration of the -kda clpro-and the -kda hd -derived cleavage products (nx). (b) trans processing at the pol zn site. the plasmid pet -pol.zn was in vitro transcribed and translated. radiolabeled substrate was incubated either with mbp- clpro or with an equal volume of column buffer/ % glycerol (denoted by plus or minus signs above the lanes, respectively). processed products were analyzed on a % sds gel. the electrophoretic migration of p is indicated by an arrow on the right of the panel. the molecular weight in kilodaltons of prestained protein markers is indicated on the left of each panel. mutated substrates using in vitro transcription-translation, followed by sds-page analyses of the protein products (fig. ) . we observed that the l and q residues, at the p and p positions, respectively, were most sensitive to mutations. any mutation at either one of these positions inhibited the autocatalytic cis processing by the clpro (fig. , lanes - ). in contrast, mutations at any of the other positions studied were tolerated and the expression of precursor proteins harboring mutations at these sites still resulted in the autocatalytic release of the -kda clpro. exceptions are the s c substitution at position p Ј (lane ) and the g p substitution at position p Ј (lane ). these mutations also abolish the cis processing by the clpro. the effect of the g p mutation, however, is expected since the introduction of a p at this site could result in a drastic change in the conformation at the cleavage site. to investigate whether the cleavage sequence requirements for trans cleavage at the hd c site by the recombinant mbp- clpro parallel that observed for cis cleavage, the same set of cleavage site mutations were introduced into the construct pet -ha.hd . c c a, which also carries an inactivating mutation in the catalytic cysteine residue of the proteinase. the release of the -kda clpro from this precursor can only be accomplished by incubation with the recombinant mbp- clpro. figure showed that the effect of these mutations on trans cleavage paralleled the effects on cis cleavage. those mutations centering around the p and p positions of the cleavage site (l and q , respectively) affected trans processing the most. for the pol zn cleavage site in orf b, similar mutagenesis studies were conducted in order to determine the cleavage specificity requirements at this site (fig. ) . mutations from the p to the p Ј position were introduced into the plasmid pet -pol.zn. the mutant substrates were expressed using an in vitro transcriptiontranslation system and then incubated with the recombinant mbp- clpro. the effect of the mutations on the production of p was assayed by sds-page and com-pared to the processing of the wildtype pol.zn substrate (fig. , lanes and ) . the results were similar to that observed with the orf a hd c site in that the l and q residues, at the p and p positions respectively, were most sensitive to mutations. the l i mutation (fig. , lanes and ) resulted in the inhibition of p processing, indicating the sensitivity of this site to a conservative change. however, the l m substitution (fig. , lanes and ) did not abolish processing of p . mutation of q to either k (lanes and ) or r (lanes and ) abolished processing at this site. substitution of s with an a (lanes and ) was tolerated, whereas mutating s to the bulkier n (lanes and ) resulted in the inhibition of p processing. the results of mutagenesis studies on both the hd c and pol zn cleavage sites demonstrate that the p and p positions at the cleavage site are the primary determinants of cleavage specificity by the clpro in both cis and trans processing. we investigated the efficiency of processing at the orf a site compared to the orf b site. the substrates, nx. c and pol.zn, were incubated with a fixed amount of recombinant proteinase for increasing periods of time. we observed that the orf a substrate, nx. c, is readily processed within h, with the substrate completely converted into product by h (fig. a) . processing of the orf b substrate, however, is not observed until after h of incubation with the recombinant proteinase. furthermore, cleavage of pol.zn occurs at a much lower level and does not reach completion even after h (fig. b ). this inefficiency of processing of the pol zn site compared to the hd c site may be explained in two ways. first, although the primary sequences of the two sites reveal no obvious reason why one is processed more efficiently than the other, it is possible that the subtle differences in the sequences of the two sites are enough to make the hd c site a more efficient substrate than the pol zn site. alternatively, the difference may not be inherent to the primary sequence of the cleavage sites, but rather to the conformation of the entire substrate as a whole. it is possible that the substrate conformation of nx. c allows the recombinant proteinase better access to the cleavage site. in the pol.zn substrate, the cleavage site may be more obscured, and in this manner the virus regulates both when and how much of its encoded proteins are produced. to determine whether the difference in processing efficiencies at the hd c and pol zn sites can be attributed to the primary sequences of the cleavage sites, we synthesized a -mer peptide, representing the p -p Ј residues of the hd c cleavage site, and a -mer peptide, representing the p -p Ј amino acids of the pol zn cleavage site. each peptide was then incubated with the recombinant mbp- clpro and the resulting cleavage products were separated from each other and from the substrate by reverse-phase chromatography. for the orf a peptide, reverse-phase chromatography of the reaction at zero time resulted in a single peak, representing the peptide substrate, in the elution profile (fig. a) . incubation with mbp- clpro followed by chromatography resulted in two additional peaks, representing the cleavage products, and a reduction in the fig. . hd c cleavage site mutagenesis: trans processing at the hd c site. the same cleavage site mutants used in the study of cis processing were introduced into pet -ha.hd . c c a, which also harbors a mutation at the catalytic cysteine residue of the proteinase. equivalent counts per minute of radiolabeled substrates expressed from these mutant plasmids by tnt were incubated with mbp- clpro or an equivalent volume of column buffer/ % glycerol (ϩ/Ϫ mbp- clpro). cleavage products were then separated on a % sds-gel. the electrophoretic migration of p is indicated by an arrow on the right. molecular weight markers are indicated on the left. substrate peak. microsequencing of the cleavage products confirmed that cleavage had occurred between q and s . for the orf b peptide similar results were observed in that the peptide substrate eluted as a single peak at zero time (fig. b) . incubation with mbp- clpro prior to separation resulted in two new peaks, representing the two cleavage products. a reduction in the substrate peak was also observed. again, the authenticity of cleavage of the synthetic peptide was confirmed by microsequencing of the cleavage products, which showed that cleavage had occurred between q and s . the level of cleavage of the orf a peptide was not any different from, and in some cases was less than, that of the orf b peptide. we observed that under identical reaction conditions no more than % of the orf a peptide was cleaved by the mbp- clpro, whereas with the orf b peptide the level of cleavage was between and %. in order to allow direct comparison between the cleavage efficiencies of the nx c and pol zn sites, we determined the k cat and k m values for the reactions with the synthetic peptides described above. results show that cleavage of the orf a peptide by the recombinant mbp- clpro yielded a k cat of . s Ϫ , and a k m of . Ϯ . mm. with the orf b peptide the k cat was slightly slower, with a value of . s Ϫ , and the k m was calculated at . Ϯ . mm. our results therefore show that cleavage (k cat ) of the orf a peptide occurred at only a slightly higher rate ( . -fold) than that of the orf b peptide. the lack of substantial difference between the k cat values is not surprising given that the sequences of the peptides are very similar. interestingly, when the catalytic efficiency (k cat /k m ) was taken into consideration, the orf b peptide (k cat /k m ϭ . ϫ Ϫ m Ϫ s Ϫ ) was a slightly better substrate than the orf a peptide (k cat /k m ϭ . ϫ Ϫ m Ϫ s Ϫ ). taken together, the peptide cleavage results presented here suggest that the cleavage efficiencies of the two peptides are similar. the action of viral-encoded proteinases is essential to viral replication (dougherty and semler, ) . this makes viral-encoded proteinases potentially good targets for antiviral drugs. in the murine coronavirus, two such proteinases are under continued investigation in order to better understand the manner in which these proteinases function. papain-like proteinase (plp- ) fig. . pol zn cleavage site mutagenesis. recombinant mbp- clpro was used in posttranslational trans cleavage assays with radiolabeled substrates generated from pet -pol.zn that encoded a wildtype cleavage sequence or harbored mutations around the pol zn site. substrate volumes containing equivalent counts per minute were incubated with mbp- clpro (denoted by plus sign above the lanes) or an equivalent volume of column buffer/ % glycerol (denoted by a minus sign above the lanes). the electrophoretic migration of p is indicated by the arrow to the right of the panel. molecular weight markers are indicated on the left. has been linked to the processing of several nonstructural proteins encoded in the Ј end of the viral genome (baker et al., ; bonilla et al., ; denison et al., ; hughes et al., ) . these cleavage products, p and p , are not yet linked with any known viral function. however, those viral proteins with presumed functions in viral replication and viral rna transcription, such as the rna-dependent rna polymerase (pol) and the zinc-finger/helicase (zn-hel) proteins, are believed to be processed by the c-like proteinase of the virus. the clpro is predicted to cleave at, at least, sites in pp ab. many of the cleavage sites located in the a region of pp ab have been shown to be functional cleavage sites and processing by the clpro at these sites has been demonstrated in the coronaviruses ibv (liu et al., ; liu and brown, ; ng and liu, ; tibbles et al., ) , hcv- e (ziebuhr et al., ; ziebuhr and siddell, ) , and mhv-a piñó n et al., ) . some of the mature viral products resulting from these cleavages have also been identified in infected cells (liu et al., ; lu et al., ; ng and liu, ; piñó n et al., ; ziebuhr and siddell, ; denison et al., ) . according to computer predictions, further processing at the q s , q c , q s , and q a sites located in the b region of mhv-a pp ab would result in mature viral products of , , , , and kda, respectively, corresponding to pol, zn-hel, and the three c-terminal-most cleavage products. in hcv- e, viral products of , , and kda, corresponding to the pol, zn-hel, and the second c-terminal-most proteins, have been identified in infected cells and the role of the clpro in the processing of these products has been authenticated in vitro (grötzinger et al., ; heusipp et al., a,b) . similarly, in the case of ibv, viral proteins of , , and kda have been identified in infected cells (corresponding to pol and the two c-terminal-most proteins) and cotransfection experiments have implicated the clpro in the processing of these viral products (liu et al., . for mhv-a , however, demonstration of processing at any mhv cleavage site in the b region of pp ab has lagged behind that of hcv and ibv. here we report the first demonstration of processing at the site between pol and zn in pp ab by the mhv-a clpro. the cleavage at the pol zn site is highly inefficient compared to the processing observed at the hd c site, as evidenced by the time course assays illustrated in fig. . phosphorimager analyses indicate that the pol.zn substrate is cleaved fivefold less efficiently than the nx. c substrate (data not shown). the differences in these efficiencies, however, could not be explained by the subtle differences in the primary sequences of the cleavage sites alone. in fact, when presented to the enzyme in the context of a -or -mer peptide substrate, there was no substantial difference in cleavage efficiency between these two peptides, an observation that would seem to contradict the results obtained with the larger polyprotein substrates. recently, ziebuhr and siddell ( ) investigated the efficiency of processing at several hcv- e clpro sites located at the c-terminus of pp a or the central region of pp ab. they observed that several viral products were produced less efficiently than others, with reduced cleavage activity at two orf a sites, between v-q s and l-q n , compared with cleavage at the sites flanking the clpro domain. the corresponding lq n site in mhv-a has also been shown to be less effi-ciently cleaved than the lq s sites flanking the clpro domain . in the case of l-q n in hcv- e, additional peptide cleavage data demonstrated that the properties of the cleavage sequence itself, rather than the overall conformation of the polypeptide and the accessibility of the cleavage site, contribute to the observed inefficiency of processing at this site (ziebuhr and siddell, ) . while our results do not contradict theirs, they do demonstrate that, in vivo, the conformation of a larger polypeptide substrate is likely to be as important a determinant of cleavage as is the primary structure and sequence of the cleavage site. our results clearly show that, at least with the case of the pol zn site of mhv-a , the observed inefficiency of cleavage was likely due to the overall conformation of the polypeptide, which may directly translate into the accessibility of the cleavage site, rather than the primary sequence of the site. taken together, both sets of results highlight important regulatory mechanisms employed by the virus to coordinate the temporal production and the accumulation of the various replicase proteins. thus, the very slow in vitro processing at the orf b cleavage site, compared with that at the orf a site, generally correlates with the levels of orf a and orf b polypeptides found in infected cells. however, there are inherent differences between in vitro cleavage reactions with either recombinant proteins or peptide substrates and in vivo processing; these include the lengths of the substrates, the concentration of the enzyme and substrates, as well as the subcellular localization of replication complexes in vivo (denison et al., ; ziebuhr and siddell, ) . these differences may all contribute to the extended lengths of time necessary for in vitro cleavages. the substrate specificity of the coronavirus clpro has been determined mainly through the identification of functional cleavage sites and a visual inspection of these cleavage site sequences. mutagenesis has been done primarily to verify the authenticity of the cleavage site and most mutagenesis studies have not extended beyond the q residue that is absolutely conserved at the p position in all clpro cleavage sites identified to date. we have extended our mutagenesis studies to cover the p to p Ј positions of the cleavage site sequence. our results demonstrated that the substrate sequence specificity of the clpro is primarily influenced by the amino acid residues present at the p and p positions of the cleavage site. substitution of the q residue at the p position with any other amino acid has resulted in substrates that could not be cleaved by the clpro, demonstrating the importance of this residue in the substrate. we could not detect any processing in substrates containing mutations at this position (figs. - ) . the p position of the cleavage site is most often occupied by an l; however, in some cleavage sites identified in hcv- e, this position is occupied by a v or an i (grötzinger fig. . chromatograms of cleavage products of the synthetic peptides corresponding to orf a amino acids and orf b amino acids. a and b show chromatograms of orf a peptide ( mm) and orf b peptide ( mm), respectively, at h and after h of incubation with mbp- clpro ( . m enzyme) at °c. in both panels the chromatograms for t ϭ h are offset to allow comparison between the two time points. the asterisks (*) indicate the cleavage products that were used in peptide sequencing in order to confirm the sites of cleavage. ziebuhr and siddell, ) , suggesting that this position is not as strictly conserved as the p position and may thus be able to tolerate some mutations. however, our results demonstrated otherwise, in that even a conservative change to i resulted in a reduction of cleavage , to below %, as measured by phosphorimager analysis (data not shown). this suggests that perhaps an i at the p position of the cleavage site can be functional only when compensated for elsewhere in the substrate sequence. the only change tolerated at this position was a change to m in the p position at the pol zn site (fig. ) . interestingly, the jhm strain of mhv encodes an m instead of an l at this position (lee et al., ) . thus such a mutation resulted in a wildtype jhm pol zn site and a cleavage efficiency equivalent to wildtype levels. this provides indirect evidence that the pol zn site in jhm is a functional clpro site. although the s found at the p Ј position of both the hd c and pol zn sites is not as sensitive to mutations, we observed that some mutations, such as s c (in the hd c site) (fig. ) and s n (in the pol zn site) (fig. ) , are not tolerated. additionally, both s a and s a substitutions, though tolerated, resulted in a reduction, but not complete inhibition of cleavage , suggesting that the residue at the p Ј position also plays a role in substrate recognition, although to a lesser extent than those at the p and p positions. aside from the amino acid residues at the p , p , and p Ј positions, the amino acid sequences surrounding the scissile q s(a,g) peptide bonds that are recognized by the clpro do not share any other significant primary structure similarity. furthermore, the presence of an lqa tripeptide not cleaved by the proteinase would again suggest the existence of a common conformational determinant shared by all clpro susbstrates that is necessary for clpro-mediated processing. the pol zn site is the first functional cleavage site identified in the orf b region of the mhv-a pp ab. further work remains to be done in order to identify other functional cleavage sites in orf b. additionally, the mature viral products resulting from these processing events remain to be identified in infected cells. a direct comparison of the cleavage efficiencies of each site will help to elucidate the complex posttranslational processing pattern of the mhv-a gene polyprotein, as well as provide insight into the regulatory mechanisms employed by the virus to maintain the production of its proteins under control. the parental plasmids used in this study are illustrated in fig. . the plasmids pet -nx. c, pet -ha.hd . c, and pmal- c.wt have all been described elsewhere (piñó n et al., ) . pet -nx. c encompasses mhv-a nucleotides - , encoding the last amino acids of hd and the entire -aminoacid region encompassing the clpro (from k to q ). pet -nx. c c a is the same as pet -nx. c but carries the inactivating c a mutation in the catalytic cysteine residues of the proteinase. the plasmid pet -ha.hd . c encodes mhv-a nucleotides - , encoding hd and the clpro from s to q , directly behind the influenza hemagglutinin (ha) epitope under the control of the t promoter in pet a. the plasmid pet -ha.hd . c c a contains the clpro inactivating mutation c a in the background of the parental pet -ha.hd . c plasmid. the plasmid pmal- c.wt encodes mhv-a nucleotides - , corresponding to the clpro region from s to q , behind the mal e gene in the pmalc vector (new england biolabs). this plasmid encodes the clpro domain fused to the coding sequence of the maltose binding protein and is used for overexpression of the recombinant mbp- clpro enzyme. a region of mhv-a gene , from nucleotides to , corresponding to orf b amino acids l to q , was pcr amplified from a plasmid encoding the entire orf b sequence using the primers f bp - ( Ј-ttcgaattccccgggggatcccttatggcatg-caatggacac- Ј) and r bp - ( Ј-cgaattc-ctctagaaagcttgctgaaacgtctcaggcacact- Ј). the resulting pcr fragment was digested with bamhi and hindiii (denoted by the underlined sequences in the primers) and cloned into the corresponding sites of pet a, resulting in the plasmid pet -pol.zn. the pet -ha.hd . c cleavage site mutants l i, l s, q k, q r, and s a were created by two rounds of pcr as described previously hughes et al., ) using the fmp and rmp primers listed in table and the primers fij ( Ј-tg-gcttgtcatgtatggtgc- Ј) and rsp - ( Ј-aacatatcctacagaacc- Ј). the resulting mutant fragments were digested with kpni and bamhi and cloned into the same sites in pet -ha.hd . c. all other cleavage site mutants in pet -ha.hd . c were created using the quikchange mutagenesis kit (stratagene) following the manufacturer's protocols. the mutagenic primers used are listed in table . following pcr amplification, the amplified plasmids were digested with the restriction enzyme dpni, which digests methylated and hemimethylated dna, thus destroying the parental plasmid and any hybrids containing one parental strand and one mutated strand. escherichia coli xl -blue supercompetent cells (stratagene) were then transformed with the mutated plasmids. the presence of the desired mutation was verified by sequencing. the fragments containing the hd . c cleavage site mutations were also subcloned into pet -ha.hd . c c a using the ndei and bamhi sites in order to create plasmids carrying both the inactivating c a mutation and mutations at the cleavage site. these plasmids were used to express substrates used in trans cleavage assays. all pet -pol.zn cleavage site mutants v s, l i, l m, q k, q r, s a, and s n were created using the quikchange mutagenesis kit as described above. the primers used in creating these mutants are also listed in table . cell-free expression of plasmid dnas was carried out using the tnt rabbit reticulocyte lysate-coupled transcription-translation system (promega) at °c for h, as previously described (piñó n et al., ) . the incorporation of [ s]methionine into acid precipitable counts was used as an indicator of protein synthesis. equivalent amounts of acid precipitable counts were directly analyzed by sds-polyacrylamide gel electrophoresis (sds-page) or used in posttranslational proteolytic assays as indicated. radioimmunoprecipitations were carried out as described previously denison et al., ) . expression of the recombinant clpro from pmal- c.wt (neb), which expresses clpro as an mbp- cl fusion protein, and the purification of the fusion protein was carried out according to the manufacturer's protocol and as described by herold et al. ( ) . briefly, e. coli tb cells transformed with pmal- c.wt were grown at °c in the presence of ampicillin ( g/ml) until the a nm reached . , at which point the cells were induced with isopropylthio-␤-d-galactoside at a final concentration of . mm for h at °. cells were harvested and then resuspended in ml of column buffer [ mm tris-cl (ph . )], mm nacl, mm edta, mm dtt] per gram of cells. the cell suspension was then lysed by sonication. cell debris was pelleted by centrifugation at g for min. the crude lysates were diluted : in column buffer and then loaded onto an amylose column (bed volume ml), preequilibrated with column buffer, at a flow rate of ml/min. the column was then washed with column volumes of column buffer and the mbp- clpro fusion protein was eluted with column buffer containing mm maltose. fractions of ml were collected and those containing the -kda recombinant mbp- clpro were identified by analyzing -l aliquots by sds-page. fractions containing the recombinant proteinase were pooled and the concentration of the fusion proteinase was determined using the bradford assay against known concentrations of bovine serum albumin. the recombinant proteinase was stored at Ϫ °c in column buffer supplemented with % glycerol. radiolabeled substrates containing the cleavage sequences of hd . c or pol.zn (fig. ) were generated using the tnt rabbit reticulocyte lysate system. lysate volumes containing equivalent counts per minute were incubated with approximately - g of recombinant proteinase or an equivalent volume of column buffer/ % glycerol at °c for - h or, where applicable, the specified lengths of time. the processed products were analyzed by sds-page followed by autoradiography. phosphorimager analysis was carried out as previously described teng et al., ) . recombinant mbp- clpro enzyme ( . mg/ml in elution buffer supplemented with % glycerol and mm dtt, . m enzyme) was incubated with synthetic peptide orf a ( . to . mm in % dmso), with the sequence h n-thr-thr-ser-phe-leu-gln ser-gly-ile-val-lys-met-val-ser-cooh, corresponding to orf a amino acids to (arrow indicates cleavage site) or orf b peptide ( . to . mm in % dmso), with the sequence h n-arg-ser-ala-val-leu-gln ser-val-gly-ala-cys-val-val-cys-ser-cooh, corresponding to orf b amino acids to (arrow indicates cleavage site) in a final reaction volume of l. the reactions were allowed to proceed at °c for - min, at which time the reactions were quenched by addition of trichloroacetic acid (tca) to % final concentration. the samples were then chilled on ice, and the denatured protein was precipitated by centrifugation. for zero time point samples, the enzyme was mixed with tca prior to the addition of peptide substrates and the reactions were carried out as described above. separation of cleavage product from substrate was carried out with the Ä ktapurifier system (amersham pharmacia biotech) equipped with a sephasil peptide c -m st . / reverse-phase column (amersham pharmacia biotech). elution was performed with a linear gradient of % eluent a ( . % aqueous tfa)/ % eluent b ( . % tfa in % acetonitrile/ % water) up to % eluent b (seybert et al., ) over min (flow rate . ml/min, detection at nm). with the orf a peptide the level of cleavage was no more than %, whereas with the orf b peptide the level of cleavage was between and %. the data obtained were fitted to the michaelis-menten equation and the k cat and k m values were obtained using kaleidagraph . (synergy software). identification of a domain required for autoproteolytic cleavage of murine coronavirus gene a polyprotein mouse hepatitis virus strain a rna polymerase gene orf a: heterogeneity among mhv strains characterization of the leader papain-like proteinase of mhv-a : identification of a new in vitro cleavage site completion of the sequence of the genome of the coronavirus avian infectious bronchitis virus the primary structure and expression of the second open reading frame of the polymerase gene of the coronavirus mhv-a ; a highly conserved polymerase is expressed by an efficient ribosomal frameshifting mechanism an efficient ribosomal frame-shifting signal in the polymerase encoding region of the coronavirus ibv nidovirales: a new order comprising coronaviridae and arteriviridae identification and characterization of a -kda protein processed from the gene polyprotein of the murine coronavirus mhv-a the putative helicase of the coronavirus mouse hepatitis virus is processed from the replicase gene polyprotein and localizes in complexes that are active in viral rna synthesis intracellular processing of the n-terminal orf a proteins of the coronavirus mhv-a requires multiple proteolytic events identification of polypeptides encoded in open reading frame b of the putative polymerase gene of the murine coronavirus mouse hepatitis virus a the genome organization of the nidovirales: similarities and differences between arteri-, toro-, and coronaviruses expression of virus-encoded proteinases: functional and structural similarities with cellular enzymes complete sequence ( kb) of the polyprotein-encoding gene of transmissible gastroenteritis virus coronavirus genome: prediction of putative functional domains in the non-structural polyprotein by comparative amino acid sequence analysis characterization of a -kda polypeptide encoded in gene of the human coronavirus hcv e nucleotide sequence of the human coronavirus e rna polymerase locus an "elaborated" pseudoknot is required for high frequency frameshifting during translation of hcv e polymerase mrna characterization of coronavirus rna polymerase gene products identification and subcellular localization of a kda polyprotein ab processing product in human coronavirus e-infected cells identification of an atpase activity associated with a -kilodalton polypeptide encoded in gene of the human coronavirus e coronaviridae: the viruses and their replication pathogenesis of mouse hepatitis virus-induced demyelination identification of the murine coronavirus p cleavage site coronavirus protein processing and rna synthesis is inhibited by the cysteine proteinase inhibitor e d coronavirus: organization, replication and expression of genome the complete sequence ( kilobases) of murine coronavirus gene encoding the putative proteases and rna polymerase a -kilodalton polypeptide encoded by open reading frame (orf) b of coronavirus infectious bronchitis virus is processed by orf a products characterization and mutational analysis of an orf a-encoding proteinase domain responsible for proteolytic processing of the infectious bronchitis virus a/ b polyprotein proteolytic mapping of the coronavirus infectious bronchitis virus b polyprotein: evidence for the presence of four cleavage sites of the c-like proteinase and identification of two novel cleavage products proteolytic processing of the coronavirus infectious bronchitis virus a polyprotein: identification of a -kilodalton polypeptide and determination of its cleavage sites identification and characterization of a serine-like proteinase of the murine coronavirus mhv-a mouse hepatitis virus c-like protease cleaves a -kilodalton protein from the open reading frame a polyprotein in virus-infected cells and in vitro identification of a -kda polypeptide processed from the coronavirus infectious bronchitis virus a polyprotein by the c-like proteinase and determination of its cleavage sites efficient autoproteolytic processing of the mhv-a c-like proteinase from the flanking hydrophobic domains requires membranes expression and characterization of a recombinant murine coronavirus c-like proteinase expression, purification, and activity of recombinant mhv-a clpro expression of murine coronavirus recombinant papain-like proteinase: efficient cleavage is dependent on the lengths of both the substrate and the proteinase polypeptides characterization in vitro of an autocatalytic processing activity associated with the predicted c-like proteinase domain of the coronavirus avian infectious bronchitis virus proteolytic processing of the open reading frame b-encoded part of the arterivirus replicase is mediated by nsp serine protease and is essential for virus replication characterization of a human coronavirus (strain e) c-like proteinase assay processing of the human coronavirus e replicase polyproteins by the virus-encoded c-like proteinase: identification of proteolytic products and cleavage sites common to pp a and pp ab the authors thank ravi mayreddy for the construction of the orf bencoding plasmids. peptide sequencing was provided by the protein chemistry laboratory of the school of medicine (university of pennsylvania), supported by core grants of the diabetes and cancer centers (dk- and ca- ). this work was supported by nih grant ai- . key: cord- -d htyfcl authors: gaglia, marta maria; rycroft, chris h.; glaunsinger, britt a. title: transcriptome-wide cleavage site mapping on cellular mrnas reveals features underlying sequence-specific cleavage by the viral ribonuclease sox date: - - journal: plos pathog doi: . /journal.ppat. sha: doc_id: cord_uid: d htyfcl many viruses express factors that reduce host gene expression through widespread degradation of cellular mrna. an example of this class of proteins is the mrna-targeting endoribonuclease sox from the gamma-herpesvirus kaposi’s sarcoma-associated herpesvirus (kshv). previous studies indicated that cleavage of messenger rnas (mrna) by sox occurs at specific locations defined by the sequence of the target rna, which is at odds with the down-regulation of a large portion of cellular transcripts. in this study, we address this paradox by using high-throughput sequencing of cleavage intermediates combined with a custom bioinformatics-based analysis pipeline to identify sox cleavage sites across the mrna transcriptome. these data, coupled with targeted mutagenesis, reveal that while cleavage sites are specific and reproducible, they are defined by a degenerate sequence motif containing a small number of conserved residues rather than a strong consensus sequence. this degenerate element is well represented in both human and kshv mrna, and its presence correlates with rna destabilization by sox. this represents a new endonuclease targeting strategy, in which use of a degenerate targeting element enables rna cleavage at specific locations without restricting the range of targets. furthermore, it shows that strong target selectivity can be achieved without a high degree of sequence specificity. triggering wide-spread rna degradation is a common strategy that viruses use to decrease host gene expression, also known as host shutoff [ , ] . viral factors from many different families including herpesviruses, coronaviruses and orthomyxoviruses either directly cut rnas or indirectly stimulate rna cleavages in an endonucleolytic fashion [ , ] . cellular rna exonucleases are then recruited to degrade the fragments, resulting in a reduction in rna and consequently protein levels [ ] . despite the fact that the proposed role of most of these host shutoff ribonucleases (rnases) is to modulate immune responses, they are generally thought to have little or no specificity and to affect host messenger rnas (mrnas) indiscriminately. however, increasing evidence suggests that this view may be overly simplistic, and that some of the rnases display selectivity for or against specific targets [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . this type of specificity may provide an additional level of regulation in viral control of the host transcriptome. how this selectivity is achieved and how it is balanced with the widespread shutoff phenotype remain open questions. the sox family of proteins from gamma-herpesviruses is an example of a viral rnase that displays both broad targeting of rnas and a poorly understood level of selectivity. gammaherpesviruses include the human pathogens kaposi's sarcoma-associated herpesvirus (kshv), which causes kaposi's sarcoma as well as lymphomas in immunocompromised individuals and remains a leading cause of cancer-linked death in sub-saharan africa. the sox (orf ) protein is expressed early during the lytic cycle of kshv infection and its expression triggers rna degradation, which is recapitulated by expression of the protein alone [ ] . homologs of sox in the other human gamma-herpesvirus, epstein barr virus (ebv bglf ), and in the model murine pathogen murine herpesvirus (mhv musox) also degrade rna in cells [ , ] . studies in mhv suggest that host shutoff by the sox family of proteins is crucial for viral replication in specific cell types and for systemic spread of the virus and establishment of a latent infection [ ] . transcriptomic studies of mrna levels during kshv or mhv infection and in cells overexpressing sox demonstrate that this family of proteins triggers the degradation of a majority of both host and viral transcripts [ , , ] . however, in-depth mechanistic studies of sox reveal a more complex picture. sox targets mrnas, as opposed to non-coding rna species, a specificity that is related to the association of sox with polyribosomes [ ] . moreover, selected transcripts, like the cytokine interleukin (il- ) [ ] and apoptosis enhancing nuclease (aen) [ ] , are spared from sox-mediated decay. in the case of il- , protection is conferred by the presence of a protective sequence in the ' untranslated region (utr) [ ] , but aen appears to be intrinsically resistant to sox mediated degradation [ ] , without a clear protective element in its sequence. the most unexpected observation, however, is that kshv sox and ebv bglf cut rnas at specific locations that appear to be determined by an unknown targeting element [ , ] . these specific cleavages become apparent upon knockdown of the human rnase xrn , the major '- '-directed rnase in eukaryotic cells, which is responsible for clearing the ' rna fragments generated by these rnases [ ] . this ability of sox to cut at specific locations within mrna yet target the majority of transcripts argues for a degenerate targeting motif. in general, the principles guiding the positioning of rna cleavages by cellular and viral mrna endonucleases are not well understood. in the case of endonucleases, the term "sequence specificity" is sometimes used to refer to preferential cutting at specific dimers, often inferred from in vitro studies (for example in datta et al. [ ] ). however, this specificity cannot explain cutting of rnas at single locations in mrnas. additional specificity can be conferred by localization of the target mrna to a specific site in the cells [ ] or proximity to a "landmark" feature on the rna, such as the ' of the transcript [ , ] or the stop codon location [ ] . the sox targeting system is unprecedented because the sequence of the targeting element alone appears to direct cleavages in mrna, and because the targeting element is longer than nt [ ] . to address how sox specificity is mediated, we applied a degradome sequencing technique called parallel analysis of rna ends (pare) [ ] to map cleavage sites of the sox protein across the human transcriptome. development of a stringent python-based analysis algorithm, which we term pydegradome, allowed identification of sox-dependent cuts at specific locations across the mrna transcriptome. the sequences surrounding these sites contained no strong consensus sequence, but rather a degenerate sequence pattern that nonetheless conferred specificity when analyzed experimentally in endogenous mrna targets. the presence of a more complex targeting motif explains how sox achieves cleavage specificity without sacrificing target breadth, and offers a framework for understanding how additional viral and cellular endonucleases may operate. development of a novel bioinformatics pipeline to detect highconfidence sox cleavage sites across the transcriptome following pare prior analyses of individual mrnas indicated that the kshv rnase sox cuts at specific locations within the rna, in a manner dependent on the sequence surrounding the cleavage site [ ] . by performing ' rapid amplification of cdna ends ( ' race) on the gfp reporter mrna, we found that the gfp mrna was cleaved in the same location regardless of whether sox was transiently expressed in t cells or expressed from the kshv genome in lyticallyreactivated islk. cells (s fig) . this is in agreement with the fact that sox activity does not rely on any additional viral proteins, and that its cleavage activity can be studied in transfected cells [ , , ] . to dissect the specificity of sox cleavage across the mrna transcriptome, we designed an approach to map sox cleavage sites in endogenous mrnas using pare [ , ] . pare is an rna-seq based methodology that allows mapping of the ' ends of uncapped, phosphorylated rna species, such as mrna degradation fragments ( fig a) . as we previously found that the sox degradation intermediates are cleared by the host '- ' exonuclease xrn [ ] , we stabilized degradation intermediates in t cells expressing a gfp-sox fusion [ ] by knocking down xrn (s a fig) . cells expressing gfp alone were used as controls to filter out rna fragments generated by cellular rnases or other processing events. this was important because multiple studies have shown that pare and similar techniques detect a large number of rna fragments in human cells, many of which are of unknown origin [ ] [ ] [ ] . we prepared and sequenced pare experimental procedure and peak finding analysis pipeline. a) diagram of the pare procedure. b) schematic of pydegradome analysis approach, which uses read counts in a control sample to generate a table of thresholds to compare test sample counts to. the table lists thresholds for a particular user-defined confidence level and for a range of ratios between control and test samples. the applicable ratio for each position is computed by multiplying a user-defined multiplicative factor by the ratio of total read counts for the exon in test vs. control samples, thus accounting for variation in rna levels and total mapped reads. read counts in test sample at each position must exceed the threshold to be identified as part of a peak. c) example of plot of read counts ( ' end only) for nt surrounding a sox cut site identified by pydegradome within the '-most exon of the limd nm_ transcript in the four samples. note that y-axis has a logarithmic scale. this example shows the expected distribution for a cut site followed by exonucleolytic degradation due pare libraries from two replicates of sox-expressing or gfp control cells and extracted the ' end of each mapped read, which represents the cleavage site (s table) . conventional analysis of pare and similar degradome datasets relies on detecting cut sites in each condition and then comparing conditions to each other a posteriori, but initial attempts at detecting and validating cut sites indicated that this approach did not provide sufficient discriminatory power to identify sox-specific cleavage sites. in previous studies using pare or similar approaches [ , ] , additional information about the pathways, such as the mirna sequences for mirna cut site studies or the proximity of the site to a stop codon for studies of smg and nonsense mediated decay, was used to further select "true" cut sites. however, such contextual information does not exist for sox cleavage specificity. therefore, we devised a python-based analysis approach that would directly use our control dataset as a baseline, which we termed pydegradome ( fig b) . the analysis uses a bayesian probability framework to determine whether the read counts at a given location differ significantly between control and test samples, taking into account random variations in the number of reads. using bayes' theorem, we determine for each location whether the underlying rate of fragment production in the test sample is a multiplicative factor larger than the control rate at a user-defined confidence level. the user also chooses the multiplicative factor. for a given control read count, we thus compute a threshold that the read counts in the test samples have to exceed to be considered part of a "peak" (fig b) . to improve efficiency when testing thousands of locations, the software builds a reference table of the thresholds for the entire dataset. this approach allowed the identification of locations within the transcriptome where the read counts were statistically higher in the samples from sox-expressing cells than in control samples (peaks). in order to correct for up-or down-regulation of the rnas and for the total number of reads obtained for each sample, the ratio used to determine thresholds was computed from the user-defined multiplicative factor and the ratio of the total number of reads mapping to each exon in test vs. control samples (fig b) . to prevent isolated high read counts from skewing our analysis, we integrated read counts within small windows ( nt) rather than single nucleotides. adjacent windows that passed the cutoff were then combined into a single peak. we optimized the userdefined confidence level and ratio by determining how many peaks were detected when comparing each sox replicate sample to its gfp control to detect sox-specific peaks, or performing the opposite comparison to detect gfp-specific peaks. in addition, we also ran the program to detect peaks specific to only one biological repeat, by comparing the two sox or gfp replicates to each other, as these peaks may represent experimental noise. although we consistently detected more sox-specific peaks than gfp-specific peaks, varying the parameters improved discrimination of sox-specific peaks (s b fig) and reduced detection of "noise" peaks specific to one repeat. based on this optimization, we empirically set the final iteration of the program to detect nt windows with read counts in the test samples that are four fold higher than read counts in the control samples within a confidence level of . %. because these parameters are conservative, we expect that the sox cut sites we detected do not represent a comprehensive list of all sox cut sites, but rather only the highest confidence sites. within each peak, we also determined the position where the read count was highest, and we considered this position the location of putative cut site (with the cleavage occurring ' of this position) (fig b and c ). with similar optimization, this program could be used to identify the ends of degradation fragments in other degradome datasets that contain matching test and control samples. sox cuts sites are abundant and not positioned relative to landmark features of mrna using the approach detailed above, we detected a higher number of peaks specific to sox-containing samples relative to control samples, consistent with broad mrna targeting by sox (fig a; s table) . even when varying the allowed distance between peaks from - nt in sample replicates, the sox samples contained~ - times the number of reproducible ("shared") sox-specific peaks (s c fig). up to % of the sox-specific peaks but fewer than % of the control gfp-specific peaks were shared between the replicates, indicating that many of the sox cleavages reproducibly occur at a given site (fig b) . the read counts at the putative cut site in the two sox replicates were highly correlated ( fig c, spearman's ρ = . , p value < . ), further demonstrating that these peaks correspond to specific sox-mediated cleavages. for downstream analyses, we focused on cut sites that were detected in both the replicates using the . % confidence level and were - nt apart (s table) . example plots of the read counts around identified cut sites are shown in figs c and s a-s f. several virally encoded host shutoff factors that trigger rna degradation, including herpes simplex virus vhs and sars coronavirus nsp , are thought to induce sequence-independent cuts near the ' end of the message [ , ] . to examine whether sox cleavage sites in endogenous mrnas are position-specific, we compared the location of the sox-specific cut sites to those found only in control samples using the human transcript annotation from ensembl grch . in both sox and gfp control samples, more cut sites occur towards the ends of the transcripts, most frequently corresponding to the ' and ' untranslated regions (utrs) of the mrna (fig d and e ). it remains unclear whether this non-specific end bias is due to a general preference for cleavage in non-translated regions or a consequence of the pare approach. we also computed the position of the cuts relative to landmarks such as the transcript start site, start codon, stop codon or annotated ' end. only a fraction of the peaks was located within nt of any of these landmarks in either case (fig f) . although a greater percentage of the sox cut sites occurred within nt of start codons, this still only accounted for % of the cut sites. furthermore, the - % of both sox and gfp peaks near annotated transcription start sites may represent the beginning of full-length decapped mrnas rather than endonuclease cleavage fragments. collectively, these analyses indicate that sox cut sites are not restricted to a particular region of the mrna, although cleavage sites in both sox and control gfp samples may be enriched in areas of the transcripts that are not covered by ribosomes. these findings are consistent with our previous reporter mrna data [ ] . we next selected seven sox cut sites for independent experimental validation ( fig a) . the selection was based on three criteria: ) position more than nt from the annotated ' end of the transcript in order to eliminate potential transcription start sites, ) high number of mapped reads, and ) clear sox-specific peaks in a visual inspections of the read plots (figs c and s a-s f). we used two approaches to validate the cut sites, targeted ' race and insertion in reporter constructs, and found that all sites validated in at least one of the two assays. we detected a ' race fragment that appeared specifically in sox-expressing cells (s g fig) and whose size corresponded to the predicted sox cleavage location for four of the transcripts. (we were unable to detect the rnas for bloc s and srsf using control primers). our second validation approach was designed to test the hypothesis that specific rna sequences or structures flanking endogenous cut sites direct cleavage by sox even when removed from their native context, as we had seen for reporter mrnas [ ] . we inserted nt pare analysis identifies sox-specific cut sites in endogenous rnas. a) number of peaks/cut sites identified specifically in sox or gfp samples. b) fraction of the peaks identified in sox or gfp samples that were detected in both biological replicates ("shared peaks"), relative to the maximum distance allowed between the peaks. c) correlation plot of the heights of peaks found in both sox+ samples. peak height is defined as the highest read count within the peak window, at the position defined as the putative cut site. d) position of the cut sites found in both replicates ("shared cut sites") within the mrnas relative to the length of the transcript. e) position of the shared cut sites relative to the coding region of the mrna (in all samples > % of surrounding the cleavage sites from the mrna targets identified by pydegradome into a gfp reporter ( fig b) . we then co-expressed these constructs with sox in xrn -depleted cells and tested whether the inserted sequence caused a sox-specific cut in the mrna. the gfp reporter we used is normally cut by sox at~nt of the coding region [ ] , generating a degradation intermediate that is~ nt shorter than the full length mrna. we found that the insertion of the sequences from six out of seven of the candidate sox cleavage sites resulted in the appearance of a second rna fragment in sox-expressing cells (fig c and d ). interestingly, the intensity of the additional cleavage products varied between the insertions, suggesting some sequences are better sox targets than others. in particular, in the gfp reporters with insertions from the limd mrna (fig d) , we found that the original cleavage site in the gfp coding region was almost completely abolished in favor of the new cleavage site, as evidenced by the disappearance of the longer degradation intermediate. moreover, insertion of the limd nt sequence in a different position in the gfp mrna also elicited sox cleavage ( fig e) , further demonstrating functionality of the targeting sequence regardless of its broader context. taken together, these data indicate that we have identified bona fide sox cleavage sites in endogenous mrnas, and that these sites contain specific elements that lead to sox targeting. the sox cleavage site likely occurs in an unstructured region of the mrna and is characterized by an a-rich sequence just upstream of the cleavage to identify features that define a sox cleavage site, we searched the sequences surrounding the sox cut sites detected in both biological repeats for structural or sequence similarities, using the cut sites shared by the two gfp samples as a comparison set. first, localfold [ ] was used to determine the likelihood that the nucleotides around the cut are located in unpaired regions (i.e. accessibility). this program is a variation on the vienna algorithm rnafold and is based on the assumption that potential structures are formed locally, which is consistent with the success of our insertion experiments. we found that the nucleotides ' of the sox cut were more accessible (thus presumably unstructured) than surrounding sequences ( fig a) . this pattern was different from the sequences surrounding the gfp sites, suggesting that it is feature specific to sox cleavage sequences. we also computed the log likelihood for different nucleotides at positions around the sox cleavage ( fig b and c ). although no strong consensus sequence emerged, two features stood out from this analysis. first, the position right after the cut site (position ) was preferentially c or t. when we computed the frequency of different nucleotides at the cut site for both soxspecific and gfp-specific cleavages, we found that the pyrimidines c or t were found at % of the sox cut sites, whereas c or a were the most frequent bases at position in cuts specific to gfp control samples ( fig d) . this distribution is not due to a bias in library preparation, as a was the most frequent base at the beginning of both aligned (s a we also found that there were more as and fewer cs in the nt ' to the cleavage site. we computed the fraction of putative sox cut sites that had a dimers or trimers in the nt preceding the cut and found that a stretches were found before~ % of our mapped cut sites (fig the peaks fall within coding transcripts). nd = not determined, because multiple transcript isoforms are present in the annotation and the cut site position would differ between isoforms. na = no coding sequence annotated. f) position of shared cut sites relative to annotated landmarks on transcripts. for all panels in this figure, a scanning window of nt, a multiplicative factor of , a confidence level of . %, were used to predict cut sites. all cuts sites nt apart in the two replicates were used for the analyses in panel c-f (sox-specific peaks: n = , gfp-specific peaks: n = ). doi: . /journal.ppat. .g . the gfp-based reporters were then co-expressed with sox ("+" or "+sox") or an empty vector control ("-") in shxrn -treated e). this was not a general feature of the sequences that produce rna fragments, as only five of the cut sites found solely in the control samples were preceded by an a dimer and none by longer a stretches. these analyses suggest that although sox cut sites are defined by a degenerate sequence pattern, this sequence is enriched for pyrimidines at a cut site adjacent to an unstructured stretch of a residues. experimental analysis of an endogenous sox cleavage sequence confirms role of oligo a sequence and potential structural element to probe the sox targeting element further we examined more in detail the sox targeting element in the validated endogenous mrnas (fig a) . the structure prediction program rnafold [ ] predicted that the - nt surrounding the cut sites in six out of seven of the rnas fold in hairpin structures with oligo-a stretches and the cleavage sites in unpaired loops (figs a and s a). the only exception was mapk ip , which also lacked the oligo-a sequence. these structure predictions mirror the accessibility results from localfold analysis (fig a) , which indicated that the positions from - to relative to sox cut sites are more likely to be unpaired. similarly, we predicted the structures of all nt sequences surrounding the soxspecific cut sites in our dataset and determined how many of the sequences presented either the cut site, an a dimer or both the cut site and an a dimer in an unstructured region. for % of the sox cut sites, at least one of these two features was predicted to be in an unpaired region, with over % of the sequences predicted to have both ( fig b) . we reasoned that if these structural features are important for cleavage, the efficiency of the cleavage could vary depending on the length of the inserted endogenous sequence. fragments of different sizes may not be able to fold into the native structure equally well and the stability of the resulting structures may vary. consistent with this idea, we found that changing the length of the inserted fragments for two different rnas (limd and srsf ) changed the efficiency of sox cleavage, measured by the intensity of the degradation intermediate (figs c, d, s b and s c), although sequences of nt were sufficient to elicit sox cleavage. in particular, when we shortened the limd inserted sequence from nt to nt, nt and nt, the efficiency of cleavage progressively diminished (fig c and d ), consistent with limd sequences adopting a stem-loop structure that becomes destabilized upon sequential deletions of the putative stem region. surprisingly, shortening the inserted sequence for srsf had the opposite effect and increased the efficiency of the cleavage (s b and s c fig). because the same sequences are present in the nt and the nt srsf insertion, this observation cannot be explained by the presence of a targeting sequence alone. instead, we hypothesize that the shorter insertion folds more stably into an autonomous element that is required for targeting by sox. we previously found that mutating a tgaagt sequence nt before the gfp cut site to tgagtg could abolish the cleavage site in gfp (s a fig). the limd site is also preceded by a similar tgaaag sequence predicted to be in an unpaired loop. to test directly whether the a trimer in the limd sequence was required for the positioning of the cleavage, we mutated the tgaaag sequence in our insertion reporter to tgcccg, tggggg or tgtttg. as predicted by our data analysis, we found that mutation of the a trimer prevented the limd sequence from eliciting sox cleavage, indicating that the aaa is an integral part of the sox recognition site (fig e) . moreover, deletion of one of the three as in the nt insertion cells and the gfp mrna from these cells was detected using northern blotting. the arrowheads point to the additional cleavage fragments resulting from the insertions. images are representative of results from at least three experimental replicates. [ ] representation of the frequency of each base in the nt surrounding the cut sites found in both sox samples (n = ). the position of the cut site is reduced the efficiency of cleavage dramatically (fig c and d) , while rnafold structure prediction suggested that this deletion is unlikely to substantially alter the structure of the rna (s b fig). similarly, mutating an a dimer just upstream of the cut site in a srsf insertion construct reduces sox-mediated cleavage (fig g) . while the upstream a dimer likely contributes to sox targeting, we note that it is not always required, as a similar mutation in a pgam insertion construct did not abolish cleavage (s c fig). overall, these data are consistent with the idea that the a-stretch is an important feature of the sox cleavage specificity. lastly, our analysis indicated that the nucleotide g is under-represented at the position immediately following the sox cut site (position , fig b- d ). we found that mutating the nucleotide at position from an a to a g prevented sox-mediated cleavage in two out of three of the rna we tested (limd and srsf , but not the pgam ) (figs f, g and s d). these data suggest that sox activity is inhibited by the presence of a g nucleotide as the residue ' of the cleavage. collectively these data experimentally validate our computational finding and strongly suggest that sox cut sites are defined by a combination of sequence and structural features. a conserved sequence pattern is specific to sox-dependent cut sites although sequences flanking the sox cleavage sites lacked a strong consensus motif, our analysis showed that the frequency of the bases around the cut site diverged from the expected distribution for human rna sequences (fig b and c ). this suggested to us that there is a conserved sequence pattern among sox target sequences. in order to be able to search rna sequences for this variable motif, we derived a position weight matrix (pwm) for the positions - to + from the sox cleavage site ( being the nucleotide ' of the cut site, fig c) . the pwm is a matrix that lists the probabilities (transformed into log likelihood to correct for the background frequencies of the nucleotides) for each of the four bases at each of the positions of the putative motif. the log likelihoods for our pwm were derived from very high confidence sox cut sites identified in both of our experimental repeats using a very stringent confidence level of . % and located at least nt away from a transcriptional start site. the logo in s a fig is a pictorial representation of the pwm. we then used the pwm to confirm that sox cut sites specifically matched the motif by scoring several subsets of potential sox target or control sequences using the pwm. the presence of preferred nucleotides in positions - to + from the cut site results in higher (positive) log likelihood scores, whereas a poorer match to the motif produces a lower (negative) log likelihood score. indeed, the sequences flanking the set of reproducible sox cut sites (identified with a confidence level of . %) were a closer match to the motif compared to those surrounding gfp-specific fragment ends, as shown by the distribution of the log likelihood scores (fig a) . a similar difference was seen when analyzing the sequences surrounding all potential sox and control (gfp) cut sites from each of the two biological repeats. sequences around sox cut sites were a good match (positive log likelihood score) to the motif more frequently than control sequences (fig b) . in both analyses (fig a and b) , we removed the sequences of the high confidence cut sites used to derive the pwm from the analyzed sets. the results of these analyses indicate that although the indicated. d) percentage of sequences with each of the four nucleotides at the cut site (position ) among the gfp or sox specific peaks. e) percentage of sequences surrounding putative cut sites that contain at least one oligo-a stretch within the nt upstream of the cut (for d and e, sox: n = , gfp: n = ). for all panels in this figure, the shared cut sites were determined based on a scanning window of nt, a multiplicative factor of and a confidence level of . %. all cut sites with the same exact position in both sox or both gfp samples that were > nt away from an annotated transcription start site and had sufficient surrounding sequences within the same annotated exon ( nt in a, nt in b and nt in c and d) were included in these analyses. the varying number of sequences used for the analyses in the different panels is a result of the requirement for sufficient flanking sequences in the same annotated exon, but as many sequences as possible were analyzed in each case. doi: . /journal.ppat. .g fig . the sequence features at the sox cut site in limd and srsf , as well as a structural element around the site, are required for sox cleavage. a) predicted structure of the nt surrounding the sox cut site in limd , highlighting the a trimer (asterisk) and the cut site (arrow), as well as the ends of the nt, and nt insertions used in d. b) rnafold was used to predict the structure of all -nt sequences surrounding sox-specific and gfpspecific cut sites (based on a scanning window of nt, a multiplicative factor of and a confidence level of . %). the proportion of cut site where the location of the cut was predicted to be unpaired, that had an unpaired a dimer within nt of the cut or both is plotted. c-g) gfp reporters were co-expressed with sox ("+") or an empty vector control ("-") in shxrn -treated cells. the gfp mrna was detected using northern blotting. the empty arrowheads point to the additional cleavage fragment resulting from insertions, whereas the filled arrowheads point to the normal gfp cleavage fragment. c-d) - nt surrounding the sox cleavage site in the limd mrna were inserted into the gfp reporter at nt . in the nt Δa construct, one of the three as found at positions - to - from the cut site was deleted. a representative blot is shown (c), as well as the quantified intensity of the signal from the different rna species (d), plotted relative to the intensity of the bands from the nt insertion construct. error bars = standard deviation, ** p < . , *p < . (student's t-test). e) the a trimer preceding the sox cut site in limd was mutated to a c, g or t trimer in the limd nt insertion construct. images are representative of results from at least three experimental replicates. f) the a immediately ' of the sox cut site in limd was mutated to a g in the nt insertion construct. g) the a immediately ' of the sox cut site in srsf was mutated to a g (a ! g) and the a dimer preceding the sox cut site was mutated to a c dimer (aa ! cc) in the nt insertion construct. fig . a degenerate motif defines sox cut sites. a position weight matrix (pwm) for nucleotide likelihood in the nt surrounding the sox cut sites was generated from the sequences that contained sox-specific sites with a confidence level of . %. sequences were scored using this matrix, after removing the "parent" sequences where applicable. a) frequency distribution histogram of log likelihood scores for the nt surrounding the gfp-or sox-specific cut sites. gfp: n = ; sox: n = . b) frequency distribution histogram of log likelihood scores for the nt surrounding all cut sites found in the two gfp and sox samples. gfp rep : n = ; gfp rep : n = ; sox rep : n = ; sox rep : n = . c) all human and kshv annotated precise sequence composition may vary, there is a specific element marking sox cut sites that is not observed in control samples. furthermore, our stringent analysis parameters have likely resulted in an underestimation of the number of true sox targeting sites, because even sites detected in only one of our two replicates were generally a good match to the sox targeting motif. we next used the pwm to test whether the prevalence of the targeting element differed between the human and viral mrna transcriptomes. we analyzed annotated transcripts with a nt scanning window, computed a log likelihood score for every possible nt sequence, determined the highest possible motif score for each transcript and plotted the distribution of the scores. in agreement with the widespread mrna cleavage by sox, most of the annotated human transcripts have at least one sequence that is a good match to the motif (log likelihood score > , fig c) . the prevalence of high scoring sequences may explain how sox is able to degrade most transcripts. moreover, we found that kshv transcripts also contained sequences that matched the motif (fig c) , which suggests that sequence specificity is not used by sox to discriminate between host and viral mrnas. this results is consistent with findings from the related gamma-herpesvirus mhv [ ] that show degradation of viral transcripts by proteins of the sox family. we have listed examples of human and kshv rnas with high log likelihood scores in s table. interestingly, within human transcripts, coding rnas tended to contain sequences that were slightly better matches to the motif than non-coding rnas (ncrnas, fig d) , as did spliced rna in comparison to non-spliced rnas (s b fig). the meaning of these small but statistically significant differences is unclear, particularly since the log likelihood scores for all groups were generally high, indicating the presence of good sequence matches to the motif. we also wanted to test whether the sequences around the experimentally determined sox cleavage sites were more likely to match the motif than other locations on the same transcripts. we analyzed the transcripts containing sox cuts sites identified in the pare data using a -nt scanning window as described above. we then ranked the log likelihood scores for all possible -mers in the transcript, and asked how highly ranked the score for the actual cut site was. % of the experimentally observed cut sites ranked within the top % of scores for their rna, indicating that the surrounding sequences were a very good match to the motif compared to other sites in the same mrna ( fig e) . finally, we tested whether the relative degradation of different human transcripts by sox was correlated with how well their sequences matched our degenerate motif. human transcripts were classified as down-regulated by sox or sox escapees based on the data from an rnaseq experiment comparing human mrna levels in cells overexpressing gfp-sox or gfp alone [ ] the best motif score for each rna detected in the rnaseq experiment was then computed using a -nt scanning window as described above. we found that the down-transcripts were scored using the pwm. the frequency distribution histogram for the top scores for each transcript is plotted. d) human transcripts were divided into coding and non-coding based on the annotation in ensembl. the frequency distribution histogram for the top scores for each transcript in the two sets of human rnas is shown. p value (kolgorov-smirnoff test) < . . e) all possible scores for mrnas carrying an observed sox cut site far from the transcription start site (n = , confidence level = . %) were computed and ranked in comparison to all the possible scores for the transcript containing the cut. out of ( %) of the cuts were found at the site with the best score. the cumulative frequency distribution for the percentile of the score at the cut was plotted. f) human rnas were classified into sox targets ("down-regulated rnas") or sox escapees ("escapees rna") based on the results from clyde and glaunsinger [ ] . the frequency distribution histogram for the top scores for each transcript in the two sets is shown. p value (kolgorov-smirnoff test) < . . regulated rnas (fold change < . ) had better motif scores than the escapee rnas (fig f) . similarly, when we plotted the fold change for each gene against the best motif score for that gene, we found that there was a modest but significant inverse correlation between the fold change in mrna levels and the motif scores (spearman's ρ = - . . p value < . , s c fig) . these analyses suggest that the level of down-regulation of mrnas by sox is in part determined by the degree to which their sequence is a good match for the sox targeting motif. we have identified key sequence features of the targeting element that directs the rna endonuclease sox to cleave a significant fraction of the mrna transcriptome. as we had hypothesized from the analysis of individual reporter mrnas [ ] , sox cleavages in endogenous mrnas occur in a sequence-specific manner. although surprisingly large, this element is not defined by strong sequence consensus, but instead contains a small number of conserved residues. structural features may also contribute to motif-driven sox targeting. our data resolve how a sequence-specific nuclease can target such a breadth of targets. sox presents a model of rna targeting in which cleavages are at the same time sequence specific and highly promiscuous. this is achieved through the use of a degenerate sequence/structure pattern that is anchored by key residues to define specific rnase targeting locations. good matches for a loosely defined sequence pattern can be found in all viral and host rnas, enabling cleavage to be simultaneously specific and widespread. although this approach may be less efficient than the location-driven targeting of the cap-proximal region reported for other host shutoff factors that promote mrna cleavage [ , ] , such a mechanism may provide more regulatory opportunities. also, it may explain why sox has less dramatic effects on rna than other viral rnases [ ] . many cellular endonucleases have few described targets and transcriptome-wide targeting analyses of other cellular and viral rnases are limited. it will thus be of interest to apply high-throughput sequencing approaches to isolate degradation fragments in other systems and investigate whether any other viral or cellular rnases use principles similar to those employed by sox to achieve target specificity. notably, recent studies of specificity of rnase l, a host rnase that is activated in response to viral infection and cleaves viral rnas and host rrnas, have also suggested that a combination of sequence and structure is important for targeting [ , ] . however, the requirements for rnase l targeting appear less stringent than those for sox, as the preferred cleavages sites occur at uu/ua dinucleotides in unpaired regions of structured rnas [ , ] . the rna motif underlying sox cleavage specificity could be used for target selectivity, enabling a subset of viral and cellular mrnas to escape cleavage. our observation that the majority of both viral and cellular transcripts contain the sox targeting motif is in agreement with the fact that in mhv , viral mrnas are broadly susceptible to degradation by the sox homolog musox [ ] . however, subsets of viral and cellular transcripts are not susceptible to host shutoff [ , , , ] , and it is likely that at least some of these escape due to the absence of a robust targeting element. indeed, the correlation between the scores of matches to the motif and the level of degradation of host mrnas suggest that sequences within the motif can influence rates of degradation of different rnas. this may result in a more nuanced effect of soxmediated degradation. we also note that the level of degradation seen by steady-state level measurements is likely influenced by additional variables that are unrelated to the efficiency of sox cleavage, including the efficiency of removal of different sequences by xrn and a reduction of transcriptional rate due to a feedback loop triggered by the rna degradation [ ] . therefore the relationship we see may be an underestimation of the contribution of targeting preference. an additional level of regulation for transcript selectivity is also provided by the presence of dominant protective elements like the sox-resistance element we have identified in the ' utr of the il- gene [ , ] , which prevents cleavage of the gfp rna despite the presence of a strong targeting sequence. we expect that both dominant and passive mechanisms of escape from sox-mediated targeting ultimately shape the landscape of host gene expression during sox-mediated host shutoff. a limitation of our analyses is that we are unable to readily explore, both computationally and experimentally, the contribution of structural elements to the sox cleavage site. computational analysis of shared structures is difficult when the sequences involved are not evolutionarily related. moreover, even in well-characterized examples of the same protein binding different rnas, for example in the case of bacterial ribosomal proteins binding both ribosomal rnas and their own ' utrs, the features that are recognized are highly variable [ ] , and the remainder of the structure may serve as a scaffold. nonetheless, our identification of endogenous cut sites makes experimental analysis of a putative sox target structure possible in the future. a major outstanding question is how sox recognizes the targeting motif. in vitro studies indicate that the binding affinity of sox for rna is much lower than its affinity for dna [ ] , which is also processed by sox during viral genome replication in the nucleus. this suggests that sox may not recognize rna targets directly but may instead be recruited by a protein partner. this model is supported by the observation that point mutations that abolish sox host shutoff activity in cells do not affect its rnase activity in vitro [ , ] , pointing at a likely protein-protein interaction. a sox partner protein would have to be a fundamental factor in rna metabolism and/or a rna binding protein with promiscuous specificity, as it must bind a large portion of the cellular rnas. another possibility is that sox directly recognizes its target sequence, and that the apparent low affinity for rna in vitro is due to the fact that a noncognate sequence was used for the binding assay. however, the fact that sox cleaves the gfp rna sequence when gfp is expressed from an rna polymerase ii promoter, but not when it is expressed from an rna polymerase i or iii promoter [ ] argues against this scenario. how this motif potentiates sox targeting, as well as whether it is used as a protein-binding scaffold for other purposes in the cell related to mrna fate remain important questions for the future. cells and xrn knockdown hek t, hek t shxrn and kshv-infected islk- [ ] and islk- shxrn cells were maintained in high-glucose dmem (gibco) supplemented with % fetal bovine serum (hyclone). shxrn cells were generated using ptripz-shxrn (thermoscientific, clone v ths_ /rhs - , targeting sequence: tatggtgagatatactatg). to induce expression of the shxrn , cells were treated with μg/ml doxycycline (fisher) for - days prior to harvesting. lytic induction of kshv was induced in islk- cells [ ] by treating cells with μg/ml doxycycline and mm sodium butyrate for - days prior to harvesting. the same induction also led to anti-xrn shrna expression in the islk- shxrn cells. for the second biological repeat of the pare experiment, cells were transfected twice with the sirna against xrn as previously described [ ] . plasmids pd egfp-n was purchased from clontech. pcdna . -gfp-sox was previously described [ ] . pcdef -sox was previously described [ ] . pd egfp-Δtgaag was previously described [ ] . to test sox-mediated cleavage of endogenous mrna sequences, - nt surrounding putative cleavage sites in the human mrnas to be tested were cloned from hek t cdna using vent polymerase (neb). they were then inserted into the bsrgi site at nt of the gfp coding region in pd egfp-n , as previously done to test gfp sequences [ ] , either by restriction enzyme digest or through a modified version of quikchange mutagenesis [ ] . an ecorv site was also generated at nt of the gfp coding region using quikchange mutagenesis (agilent) and the nt surrounding the cleavage site in limd were inserted at this location. quikchange mutagenesis (agilent) was used to insert out the following mutations: ) mutate the aaa sequence preceding the limd cleavage site to ccc, ggg and ttt in the pd egfp construct containing the nt limd fragment; ) mutate the aa sequence preceding the cleavage site to cc in the pd egfp constructs containing either the nt pgam or the nt srsf fragments; ) mutate the cleavage site from a to g in the pd egfp constructs containing either the nt limd fragment, the nt pgam fragment or the nt srsf fragment; ) mutate the gfp tgaagt to tgagtg. all primers used for cloning are listed in s table. pare library preparation and sequencing hek t cells treated with sirnas against xrn (repeat ) or expressing shrnas against xrn (repeat ) were transfected either with pcdna . -gfp-sox or pd egfp-n . in both cases > % transfection efficiency was observed. one day after transfection total rna was harvested and purified using rnabee (teltest). rna was then treated as described in zhai et al. [ ] to generate pare libraries. briefly, poly(a)+ rna was purified, and rna adapters were ligated to free ' phosphate-bearing rna ends. a second poly(a) purification was used to remove unligated adapter. cdna was synthesized using oligodt directed primers, and the cdna was then amplified times. as the adapter includes an mmei restriction endonuclease site, mmei was used to cut the double stranded amplicons bp downstream of the adapters. ' dsdna adapters were then ligated to the ' end of the amplicons. this created libraries of bp tags corresponding to the ' end of rna fragments flanked by adapters, similar to small rna libraries. libraries were checked on an agilent bioanalyzer and sequenced at the vincent j. coates genomics sequencing laboratory at uc berkeley using a hiseq illumina sequencer. raw data are available on the ncbi gene expression omnibus database as study gse . reads flagged by the casava . program were eliminated and cutadapt [ ] was used to trim away the adapter sequence at the read ' end (sequence: tggaattctcgggtgc-caaggaactccagt). because the pare protocol should produce - nt sequence tags from the ' ends of phosphorylated rna fragments, trimmed reads that were longer than nt or shorter than nt were discarded. the resulting sequences were aligned using tophat . . [ ] using bowtie as recommended for short sequences. no mismatches were allowed (-n option), and only alignments that uniquely mapped to the annotated portion of the genome (-t -x options) were retained, to simplify downstream analysis. for the alignment and subsequent analyses, grch and the ensembl annotation for this genome build were used. these and other analysis were carried out on an imac computer (mid model, . ghz intel core i , gb ram). a bayesian probability framework was used to find peaks that were specific to test samples compared to control samples, which takes into account random variations in the observed number of reads. at a given location and a given experiment, we assume that there is an underlying rate at which reads are produced, and the observed count follows a poisson distribution with mean equal to this rate. in both the control and test data sets, we find that the frequency of reads per location follows a power-law distribution, as is typical for gene expression and deep sequencing data [ , ] , and we therefore assume that the prior distributions for the underlying rates follow this powerlaw distribution, where the power is fitted from the data. at a given location, we then use bayes' theorem to construct posterior distributions for the rates, given the observations of the read counts. we then deem that there is a significant difference between the control and test at that location, if the posterior probability of the test rate being a multiplicative factor larger than the control rate exceeds some confidence level. the multiplicative factor (ratio) and confidence level are chosen by the user. the observed counts vary over a large range, from single digits up to values in the millions, and a key feature of the method is that is can effectively deal with these variations within a unified theoretical framework. in practice, for a given control read count, we can compute a threshold for the test read count, beyond which the difference in underlying rates is significant. the software builds a table of the thresholds using bicubic splines so that many locations can be tested efficiently. the peak finding python scripts are attached as s files. parameters were empirically optimized for the analysis so that a scanning window of nt, a multiplicative factor between test/control read counts of and a confidence level of . % or . % were used to output specific peaks. parameters used in the different analyses are specified in the figure legends. after identifying peaks in single test/control comparisons, the peaks found in the biological repeats were compared. for subsequent bioinformatics analysis of sequences only peaks that were found in both biological repeats were used (figs d, b, s c ). sequences surrounding cleavage sites as defined by chromosomal positions were extracted, using the human genome assembly grch build as a reference. as many sequences as possible were used for each analysis. however, because the short reads do not provide information about mrna isoform and splicing, for all sequence analyses only cut sites that had sufficient flanking sequences within the same annotated exon were used. for motif analysis (figs b, c and s a) and rnafold structure prediction (fig b) nt or nt on either side of the cut site were used. for the accessibility computation (fig a) , localfold.pl [ ] , a modification of the rnaplfold algorithm within the vienna rna package (v . . ) [ ] was used using default settings (window = nt and maxspan = nt) and sequences of nt on each side of the cut site were analyzed. the log likelihood of each base at each position was calculated using background frequencies of nucleotides derived from the human cdna list from the ensembl grch build. weblogo [ ] was used to generate a graphical representation of the sequence motif from aligned sequences. to score matches to the motif, a position weight matrix was generated using log likelihoods for positions - to + relative to the cut site using cut sites that were deemed high confidence in our analysis (i.e. identified using a . % confidence level and position at least nt away from an annotated transcription start site). the log likelihood score was then calculated for all sequences surrounding cut sites that were identified in different subsets of the data. the cut sites used to generate the matrix were always eliminated from the sets that were analyzed. to compute the score matches in human and kshv mrna, the log likelihood score for each nt sequence was calculated in all sequences longer than nt listed in the human cdna fasta repository associated with the ensembl grch build or in a kshv mrna fasta-formatted list (compiled using data from arias et al. [ ] ). the highest score was recorded for each mrna. to separate the human rnas into coding and non-coding their ensembl annotation was used. rnas annotated as "protein coding", "nonsense mediated decay" and "non stop decay" were considered coding, whereas rnas annotated as "antisense", "lincrna", "mirna", "snorna", "processed transcript", "unprocessed pseudogene", "pseudogene", "transcribed unprocessed pseudogene", "transcribed processed pseudogene", "processed pseudogene", and "unitary_ pseudogene" were considered non-coding. the ensembl annotation was also used to determine whether the transcripts were spliced. the motif scores for human mrnas detected in clyde and glaunsinger [ ] were also compared to the level of degradation, that is the fold change in steady-state mrna levels between gfpexpressing and gfp-sox-expressing samples in the cited study. for the analysis in fig f, the transcripts were categorized into "down-regulated" (fold change in sox vs. gfp < . ) and "escapees" (fold change in sox vs. gfp > . ). the structure of the sequences surrounding the validated cut sites was predicted using the rnafold webserver (vienna rna package [ ] ). rnafold v. . . [ ] was used to predict structures around all candidate cut sites, and the results were analyzed to determine whether either of the nucleotides at position - and was predicted to be unpaired. they were also analyzed to determine whether they had an a dimer within nucleotides ' of the cut site that was also predicted to be unpaired. custom python . scripts (s files) were used unless otherwise noted. where noted in the figure legends, the kolgorov-smirnoff test was used to determine whether the distribution of scores were significantly different. total cellular rna was isolated for northern blotting using trizol (life technologies). rna was separated on formaldehyde gels ( x mops buffer, . % agarose, . m formaldehyde) in mops buffer ( mm mops (sigma), mm sodium acetate, mm edta, ph . ) and transferred by capillary blotting onto nitrocellulose membrane (bio-rad) using x ssc buffer ( . m nacl, . m sodium citrate, ph . ). northern blots were probed with p-labeled dna probes made using decaprime ii (ambion), against the ' utr of the gfp reporters. blots were imaged using a fujifilm scanner fla- . quantification of the blots was carried out using imagej [ ] . ' rapid amplification of cdna ends (race) was carried out on μg of total rna using the first choice rlm-race kit following manufacturer's protocol (life technologies). ' race primers are listed in s table. protein harvesting and western blotting protein was isolated for western blots in protein lysis buffer ( mm tris ph . , mm nacl, % triton x- ) containing complete edta-free protease inhibitors (roche), separated on sds-page gels run in tris-glycine buffer and transferred onto pvdf membranes (emd millipore). western blots were performed with mouse anti-xrn antibodies (bethyl laboratories or santa cruz biotechnology, : ) or mouse anti-tubulin antibodies ( : , sigma aldrich). secondary antibodies were used at : dilution and purchased from southern biotech. a) the tgaagt sequence at positions - to - relative to the cut site in the gfp rna (see s b fig) was either partially deleted (Δtgaag) or mutated to tgagtg. the wild-type gfp and the two mutated reporters were co-expressed with sox ("+") or an empty vector control ("-") and the gfp mrna was detected using northern blotting. the arrowhead points at the position of the normal gfp cleavage fragment. b) predicted structures (with rnafold) of the nt surrounding the sox cut site in limd , showing the wild-type sequence on the left (" nt") and the mutated sequence lacking one of the as on the right (" nt Δa"). c-d) gfp reporters were co-expressed with sox ("+") or an empty vector control ("-") in shxrn -treated cells. the gfp mrna was detected using northern blotting. the empty arrowheads point to the additional cleavage fragment resulting from insertions, whereas the filled arrowhead in e points to the normal gfp cleavage fragment. the a dimer preceding the sox cut site in pgam was mutated to a c dimer (c) or the a at position was mutated to g (d) in the pgam nt insertion construct. (eps) s fig. additional analyses of prevalence of the degenerate sox targeting motif. a) weblogo [ ] representation of the frequency of each base in the nt surrounding the cut sites found in both sox samples using confidence level setting of . % and excluding sites near an annotated transcription start site (n = ). b) human transcripts were divided into "spliced" and "not spliced" based on the annotation in ensembl. the frequency distribution histogram for the top scores for each transcript in the two sets of human rnas is shown. p value (kolgorov-smirnoff test) < . . c) the fold change in mrna levels in sox-expressing vs. control cells from clyde and glaunsinger [ ] is plotted against the best motif score for that gene. spearman's ρ = - . , p < . . (eps) s table. number of reads obtained from pare. % mapping (no restrictions) indicates the percentage of reads that map to the human genome if the requirement for unique mapping to a previously annotated region of the genome is removed. (docx) s table. number of peaks detected. the number of peaks detected by using each of the samples as test or control in the pydegradome program (and plotted in fig a) is listed. parameters used for this analysis were a scanning window of nt, a multiplicative factor of , a confidence level of . %. (docx) s table. sox cut sites identified by our analysis. this table lists sox cut sites identified in both replicates with confidence level of . % or in one with confidence level . % and in the second with confidence level . %, and that were nt apart in the two replicates. the table includes the chromosomal position of the cut site, the read count at the cut site in each replicate, the gene name and the confidence level setting used for the identification. it also indicates whether the cut site could be a transcriptional start site (tss) and whether the cut site was used for the analyses in figs and or to generate the pwm for analyses in fig . only cut sites identified in both replicates with confidence level . % were used for the analyses shown in figs and . (xlsx) s table. list of human and kshv transcripts with highest log-likelihood scores of a match to sox targeting motif. (docx) s table. primers used for cloning and ' race analysis. (docx) s files. compressed archive of scripts required for pydegradome analysis. "readme.txt" file with instructions on how to run the analysis, as well as the two scripts required for the analysis are included in the archive. (zip) s files. compressed archive of scripts used for analyses in the paper. "readme.txt" file with instructions on how to run the analyses, and several scripts used to analyze the data. (zip) emerging roles for rna degradation in viral replication and antiviral defense kshv and the pathogenesis of kaposi sarcoma: listening to human biology and medicine a common strategy for host rna degradation by divergent viruses an overlapping protein-coding region in influenza a virus segment modulates the host response coordinated destruction of cellular messages in translation complexes by the gammaherpesvirus host shutoff factor and the mammalian exonuclease xrn . renne r, editor highly selective escape from kshv-mediated host mrna shutoff and its implications for viral pathogenesis deep sequencing reveals direct targets of gammaherpesvirus-induced mrna decay and suggests that multiple mechanisms govern cellular transcript escape an rna element in human interleukin confers escape from degradation by the gammaherpesvirus sox a ribonucleoprotein complex protects the interleukin- mrna from degradation by distinct herpesviral endonucleases sars coronavirus nsp protein induces template-dependent endonucleolytic cleavage of mrnas: viral mrnas are resistant to nsp -induced rna cleavage the ul protein of herpes simplex virus mediates selective stabilization or degradation of cellular mrnas the herpes simplex virus ul gene-dependent destabilization of cellular rnas is selective and may be sequence-specific lytic kshv infection inhibits host gene expression by accelerating global mrna turnover host shutoff is a conserved phenotype of gammaherpesvirus infection and is orchestrated exclusively from the cytoplasm host shutoff during productive epstein-barr virus infection is mediated by bglf and may contribute to immune evasion gammaherpesviral gene expression and virion composition are broadly controlled by accelerated mrna degradation host transcript accumulation during lytic kshv infection reveals several classes of host responses characterization of pa-n terminal domain of influenza a polymerase reveals sequence specific rna cleavage decay of endoplasmic reticulum-localized mrnas during the unfolded protein response the herpes simplex virus vhs protein induces endoribonucleolytic cleavage of target rnas in cell extracts a two-pronged strategy to suppress host protein synthesis by sars coronavirus nsp protein smg promotes endonucleolytic cleavage of nonsense mrna in human cells construction of parallel analysis of rna ends (pare) libraries for the study of cleaved mirna targets and the rna degradome rapid construction of parallel analysis of rna end (pare) libraries for illumina sequencing diverse endonucleolytic cleavage sites in the mammalian transcriptome depend upon micrornas, drosha, and additional nucleases human nonsense-mediated rna decay initiates widely by endonucleolysis and targets snorna host genes identification of smg cleavage sites and a preferred rna cleavage motif by global analysis of endogenous nmd targets in human cells global or local? predicting secondary structure and accessibility in mrnas viennarna package . ribonuclease l and metal-ion-independent endoribonuclease cleavage sites in host and viral rnas rnase l targets distinct sites in influenza a virus rnas viral nucleases induce an mrna degradation-transcription feedback loop in mammalian cells bacterial rna motif in the ' utr of rpsf interacts with an s :s complex crystal structure of a kshv-sox-dna complex: insights into the molecular mechanisms underlying dnase activity and host shutoff the exonuclease and host shutoff functions of the sox protein of kaposi's sarcoma-associated herpesvirus are genetically separable generation of a doxycycline-inducible kshv producer cell line of endothelial origin: maintenance of tight latency with efficient reactivation upon induction integration of pcr fragments at any specific site within cloning vectors without the use of restriction enzymes and dna ligase cutadapt removes adapter sequences from high-throughput sequencing reads tophat : accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions zipf's law in gene expression methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepcage data weblogo: a sequence logo generator kshv . : a comprehensive annotation of the kaposi's sarcoma-associated herpesvirus genome using next-generation sequencing reveals novel genomic and functional features nih image to imagej: years of image analysis we thank albert tai at the tufts genomics core facility for discussion on the analysis, rachel brem and members of the brem lab (buck institute of aging research) and glaunsinger lab for helpful discussions. we thank alicia bicknell and members of the glaunsinger lab and gaglia lab for critical reading of the manuscript. conceived and designed the experiments: mmg. performed the experiments: mmg. analyzed the data: mmg chr bag. contributed reagents/materials/analysis tools: chr. wrote the paper: mmg bag. key: cord- -vf qyvft authors: seitz, christian; casalino, lorenzo; konecny, robert; huber, gary; amaro, rommie e.; mccammon, j. andrew title: multiscale simulations examining glycan shield effects on drug binding to influenza neuraminidase date: - - journal: biorxiv doi: . / . . . sha: doc_id: cord_uid: vf qyvft influenza neuraminidase is an important drug target. glycans are present on neuraminidase, and are generally considered to inhibit antibody binding via their glycan shield. in this work we studied the effect of glycans on the binding kinetics of antiviral drugs to the influenza neuraminidase. we created all-atom in silico systems of influenza neuraminidase with experimentally-derived glycoprofiles consisting of four systems with different glycan conformations and one system without glycans. using brownian dynamics simulations, we observe a two- to eight-fold decrease in the rate of ligand binding to the primary binding site of neuraminidase due to the presence of glycans. these glycans are capable of covering much of the surface area of neuraminidase, and the ligand binding inhibition is derived from glycans sterically occluding the primary binding site on a neighboring monomer. our work also indicates that drugs preferentially bind to the primary binding site (i.e. the active site) over the secondary binding site, and we propose a binding mechanism illustrating this. these results help illuminate the complex interplay between glycans and ligand binding on the influenza membrane protein neuraminidase. statement of significance the influenza glycoprotein neuraminidase is the target for three fda-approved influenza drugs in the us. however, drug resistance and low drug effectiveness merits further drug development towards neuraminidase, which is hindered by our limited understanding of glycan effects on ligand binding. generally, drug developers do not include glycans in their development pipelines. here, we show that even though glycans can reduce drug binding towards neuraminidase, we recommend future drug development work to focus on strong binders with a long lifetime. furthermore, we examine the binding competition between the primary and secondary binding sites on neuraminidase, leading us to propose a new, to the best of our knowledge, multivalent binding mechanism. it has been long appreciated that glycans on influenza membrane proteins help shield the virus from the host immune system's antibodies ( ) ( ) ( ) ( ) ( ) ( ) ( ) . unrecognized glycosylation differences can also attenuate influenza vaccines ( ) . in one study, glycans were shown to reduce epitope accessibility and drug binding to receptor proteins ( ) . glycans can clearly influence antibody binding due to their presence in the antibody binding site. however, it remains to be seen whether this glycan shielding and glycoprofile variability is also a concern for influenza drugs, recognizing that these drugs are smaller than human antibodies, and the fact that glycans present themselves near, but not directly inside, the catalytic sites. currently there are three fdaapproved influenza neuraminidase (na) antivirals in the us: tamiflu (oseltamivir), relenza (zanamivir) and rapivab (peramivir), all of which have lingering questions over their efficacy, side effects, and drug resistance ( , ) . this necessitates the need for further drug development against influenza ( ) . drug developers have many hurdles to clear when designing a new influenza drug: classical admet characteristics, clinical trials and governmental regulations, among others. what is not often considered is the viral glycosylation state. the glycosylation state is the assemblage of glycans, linkages of sugars found on the surface of about half of all proteins ( ) . influenza contains n-linked glycosylation sites, defined by the asn-x-ser/thr sequon, where x can be anything besides proline ( ) . this leads to the so-called glycan shield, where glycans on the protein surface are capable of accessing much of the protein's surface area, and potentially shielding it from outside interactions ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) . among their many biological functions, glycans play a crucial, but complex role in viral infection ( ) . one salient example of glycan function in influenza is how they help the virus evade the immune system ( ) ( ) ( ) ( ) ( ) ( ) ( ) . furthermore, glycans are capable of affecting receptor binding in influenza ( , , ( ) ( ) ( ) ( ) . traditionally, glycans have been difficult to study due to their flexibility and heterogeneity. most of the glycan characterization studies are done through mass spectrometry, which can yield highly variable glycoprofile data, such as differences in the degree of post-translational modifications, sequon occupancy, and type of glycan, for different strains of influenza ( ) ( ) ( ) ( ) ( ) ( ) . similarly, glycan occupancy levels are not consistent across studies, even when using the same cell line and strain of influenza ( , ) these discrepancies may arise from differences in system setup, sample preparation, cell culturing and/or analysis method, which increases the difficulty in determining the transferability of experimental glycan results. though not well understood, the number and position of the glycosylation sites on influenza can change over time as a result of antigenic drift ( ) ( ) ( ) ( ) . this increases the glycoprofile variability, effectuating irregular but significant changes in the glycan shield over time. considering the variability and immune evasion function of the glycan shield, it remains to be seen what effect this shield has on small-molecule antiviral drug binding to viral surface proteins. previous work has shown that, depending on the viral strain and receptor mimetic used, removing viral glycans can improve binding to cell receptor mimetics ( ) ( ) ( ) . other studies have shown that these viral glycans decrease binding of other cell receptor mimetics ( , ( ) ( ) ( ) ( ) . regardless, antiviral drugs will be much smaller than a receptor mimetic, and it is not clear whether this size difference means antiviral drugs will still be affected by the viral glycans. an earlier study by kasson and pande, using ns molecular dynamics (md) simulations, showed reduced binding of α - -sialyllactose trisaccharides to hemagglutinin due to glycans ( ) . a recent review concluded that the viral glycosylation state should be considered when designing small molecule antivirals ( ) . focusing on how small molecule antivirals are affected by the glycan shield, we combine results from distinct bd and md simulations into an integrated multiscale simulation study. we have utilized bd to estimate the rates of binding of small molecules to the primary (i.e. active/catalytic) and secondary (i.e. hemadsorption) binding sites of influenza neuraminidase in glycosylated and unglycosylated states. we see that the glycan shield is capable of moderately inhibiting drug association to the primary binding site of na on the order of two to eight times. small molecule association is faster to the primary binding site than the secondary binding site. ligand binding is independent between the primary and secondary sites -the presence of one site does not influence binding at the other site. overall, this work provides insights into the impact of glycans on small-molecule binding to na. in this study, we use brownian dynamics (bd), which has been previously used to simulate protein-small molecule association ( ) ( ) ( ) ( ) . specifically, it has also been used to simulate the association of small molecules to influenza neuraminidase ( ) ( ) ( ) . bd makes the implicit assumption that long-range electrostatics and stochastic collisions with solvent molecules are the driving forces behind protein-ligand binding ( ) . therefore, it is an efficient method to simplify binding to describe electrostatically-influenced diffusion. using bd allows for a reduction in system complexity and a focus on specific modulations of ligand association. to assess whether glycans affect small molecule binding to na, we created an in silico na model using the strain of influenza a virus, a/viet nam/ / (h n ) and tetrameric pdb hty, with uniprot id q dpl ( ) . building on this structure, we generated five na constructs: (i) one unglycosylated model; (ii) one glycosylated model with web server-derived glycan conformations; (iii) three glycosylated models, each with unique, biologically-relevant glycan conformations derived from all-atom md simulations that were based on (ii) as the starting structure. finally, we ran bd simulations using these models to examine binding characteristics of oseltamivir, zanamivir and sialic acid. to note, the bd input and results files are provided on github (https://github.com/cgseitz). the unglycosylated model was built using an avian h n strain and was used as a basis for the other models. we picked this strain of influenza because it contains a glycosylation site at n , a member of the -loop that hangs over the primary binding site, as shown in figure . this close proximity provides a good test of whether glycans were capable of interfering with ligand association to influenza neuraminidase. as the bd simulations used here keep bonds rigid, it was necessary to select ligand conformations that represented a bound state and protein conformations that represented an open state, to properly approximate the initial binding contact. thus, we selected a crystallized apo head region of the strain mentioned above (pdb hty) ( ) . the stalk region has not been crystallized for any influenza na and is unlikely to influence ligand association due to its large distance from the distal binding sites, so it was not modelled. the crystallized calcium ions were retained throughout, while the crystallized glycan fragments were removed ( ) . hty was crystallized with a y h mutation (pdb numbering), which was reversed for this project through pymol ( ) . the histidine rotamer was chosen to be the one with the highest occurrence in proteins, according to pymol. the crystal structure contained a broken backbone between p and n which was fixed through schrodinger maestro; subsequently residues - (on each side of the fixed bond) were minimized through maestro ( ) . the same procedure was done for the broken backbone between v and q : the bond was created and residues - were minimized. this refitting was done for each monomer in the tetramer. the ph was set to ph . , as this was done in the reference k on experiments ( , ) . using this ph, protonation states on the neuraminidase were assigned using propka ( ) . the protonation assignments were done through the pdb pqr server ( ) . partial charges on the protein were assigned according to the amber force field ( ) . parameterizing the glycans needed special treatment as there are not glycan parameters in the amber force field. we used the glycam_ h- parameters as these would be consistent with the amber force field ( ) . to build the first glycosylated construct (with web server-derived glycan conformations), the unglycosylated na structure was uploaded to the glyprot server, and three representative glycans were added to each na monomer, for a total of glycans on the na homotetramer ( ) . though there is experimental variability in glycosylation site occupancy, we decided to place a glycan in each glycosylation site to see the maximum potential effect the glycoprofile can have on ligand association. considering most of the human h n transmission came directly from avian sources, the glycans used to model this structure came from an avian (hen egg) source for growing these glycans ( ) . additionally, this dataset is the only one containing structures experimentally found on influenza na ( ) . we chose representative glycans from this dataset, however we note that both larger and smaller glycans will exist in nature; these size differences may slightly affect the results presented here. the exact glycans were selected as shown in table . table . glycan structures from the glyprot web server. the "glycan structure" entries came from experimental results ( ) . these structures consist of n-acetylglucosamine (glcnac), mannose (man), n-acetylhexosamine (hexnac) and hexose (hex). hexnac and hex were interpreted according to their corresponding "glyprot identifier" and the structures shown in to better diversify our system, three glycosylation sites (termed as site # , site # and site # ) present on each monomer were linked to three different glycan types. importantly, the four monomers (termed as monomer a, monomer b, monomer c, and monomer d) of our homotetrameric na model were symmetrically glycosylated, meaning that sites # , # and # were populated with the same glycan across monomers. starting with the structure containing web server-derived glycan conformations, md was then used to generate representative glycan conformations, with the assumption that md would provide realistic conformations of glycans within a microsecond's worth of sampling ( , ) . the first step was porting the structure with web server-derived glycans into charmm-gui to prepare the structure for md ( ) ( ) ( ) ( ) ( ) ( ) . the disulfide bonds were taken from uniprot id q dpl . the system was embedded into a box described with explicit water molecules using the tip p model ( ). an ion model was used as described previously ( ). the full system had a size of , atoms. an ionic solution of . m nacl was used, and the charmm all-atom additive force fields were used for the protein and the glycans ( ). molecular dynamics simulations were run using gpu-accelerated amber with an npt ensemble ( , ). the system was initially minimized for a total of cycles using a combination of steepest descent and conjugant gradient methods ( , ). equilibration in an npt ensemble was performed for ps, using a timestep of fs and the shake algorithm to constrain all bonds involving hydrogen ( ) . the equilibration temperature was set at k and regulated through a langevin dynamics thermostat ( , ) . the pressure was fixed at bar through a monte carlo barostat ( ) . these simulations were run using extreme science and engineering discovery environment (xsede), specifically the comet supercomputer housed at the san diego supercomputer center ( ) . periodic boundary conditions were used with a non-bonded shortrange interaction cutoff of Å and force-based switching at Å. particle mesh ewald was used for the long-range electrostatic interactions ( ) . for the production runs, the temperature was set at . k ( ). after equilibration, this system was cloned into identical replicates. each one was run in parallel for ns each with a unique starting velocity, totaling µs of sampling. once the md simulations finished, the trajectory of each glycan was concatenated independently of the rest of the system. each of these individual glycan trajectories were then clustered using gromacs-based gromos clustering with an rmsd cutoff of . Å ( ). this number was chosen so the three most populated clusters would represent at least % of the total glycan conformations in each of the simulations. the central structure, defined as the structure with the smallest average rmsd from all other members of the cluster, from the top cluster of each of the glycans was then selected; the pyranose ring from the reducing end of the glycan was then aligned to the analogous pyranose ring of the corresponding glycan from the glyprotglycosylated structure as this should be the most stable part of the glycan ( ) . the glyprot glycans were removed and the glycans from the md simulations were attached through schrodinger maestro, to create a new na system with each glycosylation site inhabited by the central structure of the most representative conformation from the md simulations. this was then repeated for the second and third most representative glycan clusters from the md simulations. the sialic acid structures used were drawn from pdb mwe, which crystallized the boat conformation in the active site and the chair conformation in the secondary site ( ) . the chair conformation of sialic acid was crystallized with a missing carboxylate group, which was added through schrodinger maestro to model an energetically-favorable gauche conformation. zanamivir was extracted from the ckz crystal structure ( ) . oseltamivir was extracted from the cl crystal structure ( ) . a d comparison of these ligands can be seen in figure s , showing their structural similarities; we note that all mentions in this study of oseltamivir pertain to tamiflu's active metabolite oseltamivir carboxylate. these ligands were then uploaded to the prodrg server to add hydrogens ( ) . charges according to the amber force field were added through the pdb pqr server ( , ) . bd simulations were run using browndye ( ) . even though the md was run with the amber force field and the bd was run with the amber force field, we assume these to be sufficiently independent steps and the slight force field differences should not appreciably affect the results, especially as our k on numbers are relative, not absolute. the charges for the protein and ligands were reassigned according to the amber force field ( ) . the temperature was set to . k, which was the temperature for the referenced k on experiments ( , ) . the ions used are shown in table s . these ions were selected to mimic the ion and buffer concentration of the reference k on experiments ( , ). the experimental assay used mm cacl and . mm mes buffer ( ) . the ca + and clconcentrations were simply calculated by finding their ionic strengths. mes buffer is prepared with na + ; the concentrations of the buffer and na + at ph . were calculated with the henderson-hasselbalch equation ( , ) . this resulted in an overall ionic strength of . m. the calcium, chlorine and sodium van der waals radii were taken from the literature ( , ) . the mes radius was determined by building it in schrodinger maestro and measuring it in vmd ( ) . apbs was used to create the electrostatic grids needed by browndye for these simulations ( ) . the grid spacings are listed in table s . the solvent dielectric was set to while the protein dielectric was set to . desolvation forces were turned off. the debye length, determined from the concentration and charges of the ions in the solution, was set to . Å. in browndye, the b radius is defined as the starting radius for the ligand trajectories, at a distance where the force between the protein and ligand is independent of orientation. this distance is determined from the hydrodynamic center of the receptor. because of the different glycan conformations used, the b radius differed slightly between systems. if a ligand reaches what is known as the q radius, the trajectory either ends as a non-association or is restarted from the b radius according to browndye's algorithm. the q radius is defined as . times the b radius distance. the b radius ranged between Å and Å depending on the system, and the q radius ranged from Å to Å . the exact b and q radius values for each system are shown in table s . bd simulations were run on all five na models generated (i.e. unglycosylated, glycosylated with web server-derived glycans, and the three systems with md-derived glycan conformations). these simulations totaled million trajectories for each ligand/binding site pair, consistently giving reproducible rates within the small level of error reported and resulting in million trajectories total. reproducible rates will be obtained by having a binding probability of around one in a million trajectories; we found we could roughly obtain these probabilities by using million trajectories for each ligand/binding site as has been reported previously ( ) . this number of trajectories produced error values comparable to those seen in the reference experimental studies, as seen in figure s . for systems where we saw at least one binding event, the number of binding events ranged from two to (see supporting material for details). bd simulations using browndye requires the creation of reaction criteria, consisting of a list of protein-ligand atom pairs and a cutoff distance. if any three of these pairs simultaneously came closer than the cutoff distance, we assume the ligand will associate. the cutoff distance was empirically determined to be . Å; this distance approximately yielded the experimental k on rates for both oseltamivir and zanamivir ( ) . there are no other experimental k on rates towards the primary site of h n , and no referenced rates at all for the secondary site. the referenced k on experiments were done with glycans attached to na and measured to the full tetramer; this was confirmed in personal correspondence with the corresponding author (stephen martin of the mrc national institute for medical research, correspondence on july , ). considering that the reaction criteria and reaction distance were created for oseltamivir and no significant changes were made before applying them to zanamivir, we can safely assume that they are generalizable for sialic acid, an analog of both oseltamivir and zanamivir ( figure s ). the protein-ligand atom pairs were taken from crystal structures of ligands in the primary and secondary sites of neuraminidase for each monomer, and simulations were run for the full tetramer. the primary binding site was determined according to the crystallized binding pocket for our strain of neuraminidase ( ) . this pocket is noted to have a surface area of . Å and a volume of . Å ( ) . the secondary site contacts were determined from a structure of influenza a/tern/australia/g c/ ( ) . however, all the secondary site residues are conserved between that strain and the strain used in our simulations. the combined site simulations are defined as simulations with criteria allowing for association to either the primary or secondary site; it is simply a simulation run with a concatenation of the binding criteria for these sites. in this work, we define binding site contacts to be those protein-ligand contacts seen in crystal structures. from these contacts, we created protein-ligand atom pairs in browndye to determine when a reaction has occurred in our bd trajectories. there are seven primary binding site contacts reported between oseltamivir and the cl crystal structure ( ) . these binding site contacts are reported in table s and figure s . there are five primary binding site contacts reported between sialic acid and the mwe crystal structure; all five of these are analogous to those seen for oseltamivir ( ) . the binding site contacts for sialic acid are registered in table s and figure s . there is one primary binding site contact reported between zanamivir and the ckz crystal structure; this one is analogous to one seen in oseltamivir ( ) . the binding site contacts from oseltamivir were transferred to zanamivir retaining the one contact seen in the ckz crystal structure and are reported in table s and figure s . using the structural similarities of sialic acid and zanamivir to oseltamivir, analogous primary binding site atom pairs were created so that each ligand had seven primary protein-ligand atom pairs. there are five secondary binding site contacts reported between sialic acid and the mwe crystal structure ( ) . these contacts are reported in table s . there are no published reports of crystal structures of oseltamivir or zanamivir in the secondary binding site, so five analogous secondary binding site protein-ligand atom pairs were created for oseltamivir (table s ) and zanamivir (table s ) to match those seen in sialic acid, so that each ligand had five secondary binding site protein-ligand atom pairs. to pare down the data from µs of cumulative md sampling and pick out biologically-relevant glycan conformations, we clustered each glycan from the md simulations. the glycan trajectories were extracted and affixed on the static na crystal structure, to reveal the conformational space they can access (figure ) . visualizing these glycan trajectories on the na structure gives a qualitative representation of how much volume and surface area the glycans are capable of accessing. keeping in mind the primary and secondary binding sites are located just beneath the glycans (figure ) , the size and flexibility of the glycans here shows that they have the capability to "shield" the binding sites from ligand association. the three most representative clusters for each glycan were extracted from the md simulations. the central structure from each cluster was compared with the conformation generated from glyprot. these clusters show some conformational diversity, but none show a particularly similar conformation to the glyprot structure. however, the third glycan in each monomer shows a markedly decreased conformational diversity compared to the other two monomers. the clustering results from each monomer show the same trends; the results from monomer a are shown in figure , while the results from monomer b (figure s ), monomer c (figure s ) , and monomer d (figure s ) are shown in the supporting material. the glycans bend away from the binding sites on their monomer towards the binding sites on the neighboring monomer. this is seen for each monomer. the primary binding sites are in purple and the secondary binding sites are in orange. the linkage between the glycans and the protein is in yellow. the na structure is in teal. the glyprot conformation is in gray, the first conformation from the md simulations is in orange, the second conformation is in blue, and the third conformation is in green. to be confident in our computed association rates, we first needed to benchmark our system against experimental results. we created empirically-derived system criteria for the association of oseltamivir to the primary binding site of glycosylated na, as described in the methods. after matching the experimental association rate with oseltamivir, the same parameters were applied to zanamivir. these are the only two experimental association rates for h n na. subsequently, we investigated the association of oseltamivir and zanamivir to the primary sites of glycosylated na, obtaining association rates of . ± . /µm·s for oseltamivir and . ± . /µm·s for zanamivir. these are in agreement with the experimentally-measured rates of . ± . /µm·s and . ± . /µm·s, respectively, as visualized in figure s ( ) . considering the experimental systems were glycosylated, we had to pick one glycan conformation to use for computing these benchmarks in our glycosylated system; for reproducibility we chose the conformation generated from the glyprot server. we note that choosing a different conformation for our computed benchmark would change the absolute association rates by a scaling factor, but the trends would remain the same. since the predicted k on for oseltamivir and zanamivir both matched up well with the experimental rates, the system proved to be transferable to ligand analogs for the primary site. we then applied the same criteria to two different conformations of sialic acid, boat and chair, to probe if the association rate was dependent on conformation. this was done in addition to analyzing how association rate was modulated by different functional groups, via comparisons of ligand analogs such as oseltamivir, zanamivir and sialic acid. with the binding criteria set up, we calculated the association rates of each of the ligands to the primary site ( figure a) . these results show two important findings. first, there is not a large difference in association rates between the system with glyprot glycans and the unglycosylated system. this shows that a glycan may adopt a conformation where it does not inhibit ligand binding much at all. the second finding is that the glycans from the md simulations all show a moderate level of inhibition, more than the system with glyprot glycans. this shows that biologically-relevant glycan conformations will likely exhibit a moderate level of inhibition towards ligand binding. combining the first and second finding discussed in this paragraph, glycans are capable of perturbing ligand binding to na. conf is the glycan structure from the most populated cluster from the md simulations. conf is from the second most populated cluster, and conf is from the third most populated cluster. the association rates using glycans structures downloaded from glyprot are shown in gray. the association rates using structures derived from the md simulations are in bright, colorful shades whereas the others are in grayscale. the association rates without using any glycans are shown in black. (a) the glycan structures from the md simulations show a moderate association rate inhibition to the primary binding site irrespective of ligand chosen. (b) little association is seen to the secondary binding site. note the different y-axis used to be able to see the small amount of binding. (c) association rates of trajectories run with either the primary site or secondary site as the trajectory end point. similar to (a), the glycans structures from the md simulations in (c) show a moderate inhibition of ligand association. the raw data for this figure is seen in table s (oseltamivir), table s (zanamivir), table s (sialic acid boat conformation), and table s (sialic acid chair conformation). there are no experimental association rates for ligands to the secondary site, so criteria were chosen based off of crystal structure data and discussed in the methods. only sialic acid has been crystallized in the secondary site of avian na, so binding site criteria for the secondary site were extracted from that structure and used to create the criteria for oseltamivir and zanamivir, as discussed in the methods ( ) . a previous bd study suggested that oseltamivir can bind to the avian na secondary site ( ) . a follow-up nmr study also suggested that the oseltamivir binds to the avian na secondary site ( ) . however, a more recent experimental study disagreed with these findings and did not see oseltamivir binding to the avian na secondary site ( ) . considering the disagreement with oseltamivir binding to the secondary site, we decided to test this and secondary site binding for zanamivir as well. the computed association rates towards the secondary site show a markedly different story than those to the primary site ( figure b) . none of the ligands exhibited noticeable binding towards the secondary site, with the exception of the boat conformation of sialic acid. even with this conformation, there is no consistent trend when compared to primary site binding. although the boat conformation sialic acid displays a small amount of binding, the chair conformation does not show binding. these results show that we can differentiate between these two sialic acid conformations at the bd level of theory. finally, trajectories were run where the ligand could associate to either the primary site or the secondary site ( figure c) . intriguingly, the results are essentiallly a concatenation of the rates seen for the primary and secondary sites individually. considering the low level of secondary site binding, the trends here are the same as seen for the primary site. as can be seen in figure s , there is a formal charge difference between the ligands: sialic acid contains a formal charge of - while oseltamivir and zanamivir are neutral. running test bd trajectories without charge treatment (results not shown), we saw analogous results to those seen in figure . this meant that only the sterics of the systems affected binding, not electrostatics. clearly one or a few of the structural differences between the ligands play outsized roles in affecting the association rates. in this work we did not further probe which exact atoms in the ligands will change the association rates. biologically, the influenza replication cycle is propagated through na recognizing and cleaving sialic acid. this study compares the interplay between that molecular recognition process and na's aforementioned glycan shielding capabilities. this interplay is simplified here by approximating ligand binding as a diffusion-governed association process, modulated by protein electrostatics. previous studies have shown that viral proteins can exhibit a degree of glycosylation large enough to partially protect a variety of viruses from immune system antibodies; this is termed the viral glycan shield ( , , ( ) ( ) ( ) . from static structures one can envision the shielding that glycans can provide, but a dynamic representation better depicts the steric barrier encountered by immune system antibodies and drugs ( ) . in our single na protein, we see that glycans are capable of covering most of the na surface area, as shown in figure . this is consistent with studies explaining how the influenza glycan shield can cloak the influenza virion from the immune system ( - ). the glycans can access a large volume, allowing for a considerable shielding potential. however, it is worthwhile to note that influenza glycoproteins are usually not as extensively glycosylated as on some other viral proteins, such as the hiv envelope protein or the sars-cov- s protein ( , , ( ) ( ) ( ) . the exact h n construct prepared here contains a glycosylation site at n . this is part of the loop that borders the primary binding site (figure ) . the representation in figure shows that the glycans present at site n on each monomer have the combined capability to cover both na binding sites, potentially thwarting the binding of small molecules. the results shown here display a moderate inhibitory effect due to glycans, but this effect would likely not be present in proteins whose glycans only reside far from the ligand binding sites, i.e. if the setup in figure only contained the glycans at site and site on the bottom of the na head. when examining the effect of glycan conformation on binding inhibition, the glyprot glycans display a fairly vertical conformation. on the other hand, the glycans from the md simulations bend backwards, away from the primary binding site on their own monomer and towards the secondary binding site of the adjacent monomer, as shown in figure . interestingly, this bend appears to be enough to inhibit primary site binding. it has been previously shown that specific chemical modifications on the glycans can significantly change their flexibility ( ) ( ) ( ) . it has also been hypothesized that glycan flexibility plays a role in protein-receptor binding equilibria ( ) . considering the scale of biological interactions that glycans participate in, it is likely that they would exploit their flexibility to facilitate these interactions. however, the glycan environment, and nearby steric clashes would conceivably affect this flexibility as well, introducing competing effects. revisiting the input na structure in figure , we hypothesized that the glycan on top of each na monomer (the oligomannose type glycans linked to site # ) would achieve a higher degree of flexibility than the two on the bottom of each monomer (the complex and hybrid type glycans linked to sites # and # , respectively). our reasoning was that these two may find steric restrictions on their flexibility, and that the placement on the glycan on the na head would be more important than the type of glycan examined. our results show this is not quite the case. the clusters in figure , figure s , figure s , and figure s , show that, similar to the complextype glycans (a-d ), the oligomannose-type glycans (a-d ) were quite flexible even though they were situated near the hybrid-type glycans (a-d ) on the bottom of the na surface; this large degree of conformational freedom is backed up by previous work specifying that this flexibility is driven by the mannose( )-α( - )-mannose( ) and the mannose( )-α( - )-mannose( ) linkages ( ) . these are the linkages connecting the chitobiose glycan "stalk" to the two glycan "branches". finally, the hybrid-type glycans showed noticeably less conformational flexibility than either the oligomannose-type glycans or the complex-type glycans. overall, the type of glycan and its specific linkages seemed to govern its flexibility more than potential nearby steric clashes. this agrees with previous work showing that unless there is a direct steric clash, inter-residue hydrogen bonds may have a larger effect governing glycan conformations ( , ) . the results shown in figure are consistent with diffusion controlled reactions, and show relatively high association rates. the space explored is consistent with the random walk nature of diffusion. the randomness of the ligand trajectories (from brownian motion) and the small sizes of the ligands considered here minimize the effects of the glycans on binding. the rates for each ligand are mostly of similar orders of magnitude, with or without glycosylation. however, the glycan structures from the md simulations show a moderate inhibition compared to the unglycosylated na structure and the na structure with glycan structures taken directly from the glyprot web server. the extent of this inhibition ranges from a factor of about two to eight. in general, glycans can decrease binding activity of viral proteins ( , , , ) . due to their bulk and proximity to the primary ligand binding site, we hypothesized that, irrespective of conformation, the presence of glycans, particularly those near the binding sites, could substantially reduce ligand binding and removing these glycans would restore binding. what we found was a more nuanced picture. the na constructs with glycan conformations from the glyprot server showed similar binding rates to unglycosylated constructs. however, more realistic glycan conformations, extracted from the md simulations, showed a moderate but noticeable decrease in association rate, k on , on the order of two to eight times. one may naturally question whether glycans would have the same effect on dissociation rate, k off . one previous study testing antibody binding to cancer cells showed that antibody binding was relatively insensitive to the presence of glycans, indicating a similar dampening of k on and k off due to the presence of glycans ( ) . in this study mentioned, the overall equilibrium constant k d changed by less than a factor of two irrespective of the presence or absence of glycans ( ) . however, a different study done in the influenza membrane protein hemagglutinin showed that trimming the glycans from a standard length seen in hek cells to a single monosaccharide decreases the equilibrium constant k d by a factor of two to , depending on the receptor mimic used ( ) . this meant that the k on and the k off were not affected in the same way by the presence of glycans ( ) . glycans are present in the antibody binding sites of both of the studies mentioned above; this is in contrast to our system where glycans are situated near the catalytic sites, but not directly inside them. with this in mind, it seems likely that the slight slowing of binding small ligands by the glycans would be similarly reflected in a slight slowing of release, so that the equilibrium constants for binding these molecules are relatively insensitive to the presence of glycans. in effect, this is because the presence of glycans near the binding sites should not change the Δ g in accordance with the gibbs relationship. we hypothesize that our observed decrease in association rate is due to the glycans at glycosylation site n (site # ) as only those glycans are capable of sterically inhibiting the binding sites (figure ) , and we assume the glycans at sites n (site # ) and n (site # ) do not impair binding. taking the inhibition results discussed here with a different binding study using larger ligands for influenza na, there appears to be a size dependence on this inhibitory potential: smaller ligands are not as affected as larger ligands ( ) . the key points here are that small molecules are not seriously impeded from binding by the glycans; future drug discovery efforts can be focused on the development of strong binders with correspondingly long lifetimes of binding. modeling studies focused on small inhibitors are likely to be helpful, even when glycans are not included. the results seen in figure highlight the importance of using biologically-relevant glycan conformations relaxed on the protein structure as opposed to simply generating a glycan conformation and attaching it to the protein. though this study did use static structures as per the bd setup, we would expect similar trends if this study were repeated using a dynamic md environment since our bd trajectories already used the most highly-accessed glycan conformations gleaned from extensive md sampling. moreover, a study using mixed bd-md simulations analyzing the association of oseltamivir and zanamivir to na actually showed a less accurate k on rate than our coarser study using only bd ( ) . we can rationalize that the slower binding kinetics seen in our systems with biologically-relevant glycan conformations ( figure ) are due to the ligands having to maneuver around the glycans, even after running into them, and then continuing with the trajectory until reaching the binding site. this type of maneuverability can be seen in figure . we generated bd trajectories that could end with the ligand binding to the primary site ( figure a) , the secondary site (figure b) , or either site ( figure c ) on any monomer. using this setup, we were able to differentiate binding between the primary and secondary sites, and in fact found an additive binding mode when examining both sites concurrently. by simply adding up the association rates observed for the primary site ( figure a ) to the analogous simulation run to the secondary site (figure b) , the association rate to both sites ( figure c ) can be roughly obtained. we do not see any evidence of a further increase in association rate using both sites, showing that the presence of a proximal binding site does not influence association rate, either for the primary site or the secondary site. our primary site binding results show two conclusions supported by literature. in figure a we see that oseltamivir associates faster than zanamivir, as has been seen in experimental kinetics studies ( ) . moreover, we see faster binding of oseltamivir than sialic acid. this is qualitatively in agreement with an nmr study showing that oseltamivir outcompetes α ( , )-sialyllactose in binding to the avian na active site ( ) . it is not immediately clear which atoms on the ligands drive their binding differences. ligand binding to the secondary site has not been extensively studied, but it does not appear to have catalytic activity ( , ) . focusing on the secondary site, our results show three important findings. we first see that binding to the secondary site is slower than to the primary site, if binding is seen at all (figure ) . we do not see secondary site binding for oseltamivir and very little for zanamivir, though this may be as they are at the lower detection limit of our method. furthermore, we see that sialic acid binds faster to the secondary site than oseltamivir, which is in agreement with one study showing that α ( , )-sialyllactose outcompetes oseltamivir for binding to the avian na secondary site ( ) . a more recent study goes further and does not show any binding of oseltamivir to the avian na secondary site ( ) . however, we caution that a small amount of drug binding, likely only with zanamivir, may occur to the secondary site, as seen with zanamivir bound in the secondary site in the unpublished crystal structure pdb cml, and also seen in figure b . secondly, in the small amount of secondary site binding seen (figure b) , glycans are actually capable of enhancing or inhibiting binding, foreshadowing the complex role glycans play in ligand binding. finally, there appears to be a small conformational dependence on association rate, but this is only seen towards the secondary site ( figure b) . we used two different conformations of sialic acid for these binding studies. the boat conformation was crystallized in the active site and the chair conformation was crystallized in the secondary site. in our results we see the sialic acid chair conformation actually shows fractionally higher binding to the primary site than the boat conformation ( figure a) . conversely, only the boat conformation shows binding to the secondary site; the chair conformation does not register binding at all (figure b) . however, we caution that these results may be because sialic acid was crystallized in a different strain of avian na than we used in our studies. taken together, these results show that the exact ligand conformation upon approach to the binding site may not match the crystallized binding pose, but the results we present here do not permit us to explore this note or further explain a conformational dependence on binding. comparing the association rates in figure one may naturally query the competition in association rates between the primary and secondary sites. we see faster association to the primary site than the secondary site, which is not in agreement with two previous bd simulation studies ( , ) . however, the methodology of our study differs from these two studies, and from this we can unify the difference. those bd studies showed that ligands reach a distance of . Å away from the secondary site faster than to the primary site. we then show that ligands reach a distance of . Å away from the primary site faster than the secondary site, though we would like to note that our paper and the sung et al. paper assigned charges for the bd trajectories according to the amber force field and the amaro et al. paper assigned charges according to the charmm force field ( , ) . taken together, the secondary site appears to contain stronger long-range electrostatics to draw in ligands, but when the ligands approach the binding sites and sterics come into play, it appears to be more favorable for ligands to move closer to the primary site than the secondary site, assuming the rigid body approximations applied herein. considering the fact that the realistic substrates na encounters will exhibit multivalent binding, one previous study showed that the secondary site improved avian na enzymatic activity in removing sialic acid both from soluble macromolecular substrates and from cells ( ) . another study confirmed that the binding in the secondary site improved catalytic activity against multivalent substrates ( ) . other previous studies have suggested that the secondary site enhances the overall na catalytic activity by binding substrates and bringing them close to the catalytic primary site ( , ( ) ( ) ( ) ( ) . taking the studies above with our results, we postulate that multivalent cleavage will occur in a stepwise manner (figure ) . the first association event of the multivalent substrate, such as sialylated cell surface receptors, will bind to the primary site, and then to the secondary site. after sialidase cleavage occurs in the primary site, the cleaved glycan branch will dissociate. then the sialylated glycan branch bound in the secondary site will be transferred to the primary site, as suggested previously ( , ( ) ( ) ( ) . after this passage, cleavage will again occur, and the full glycan will be released, finishing the enzymatic cycle. this mechanism is in disagreement with a previously proposed mechanism, which postulates that both binding sites will not be bound simultaneously ( ) . however, we feel there is a greater body of literature suggesting that binding both sites simultaneously increases catalytic activity. we note that our proposed binding mechanism may be muddied in the case of multivalent ligands with viral glycans situated near the binding sites; in this case, the glycans may sterically inhibit multivalent binding, slowing down enzymatic activity and attenuating the replication cycle. in the case of monovalent binders, such as the inhibitors oseltamivir and zanamivir, we show in figure that association will happen to the primary site faster than to the secondary site. this appears to be biologically viable considering that previous studies have showed that the secondary site activity has no effect on enzymatic activity for monovalent substrates ( , ( ) ( ) ( ) . as the primary site is the main site of enzymatic activity, it is reasonable to assume that ligands would preferentially bind to the primary site over the secondary site; reducing transfers of ligands between the binding sites would ostensibly increase catalytic activity and efficiency. taken together, abolishing the secondary site in avian na will not affect monovalent substrates such as influenza drugs as these associate faster to the primary site anyways, which our results confirm. to exposit this a different way, influenza drugs will preferentially block primary site binding over secondary site binding. a free monovalent binder will associate to the primary site over the secondary site (a). this monovalent binder will release from the primary site before a second monovalent binder will associate to the secondary site. glycans, with their sialic acid tips, are an example of a multivalent binder (b). similar to the monovalent binders, the first multivalent binding event will occur to the primary site. next, the second sialic acid tip binds to the secondary site. with both sites bound, the sialic acid in the primary site is cleaved and released. the sialic acid bound in the secondary site is then transferred to the primary site. finally, the second sialic acid is cleaved and released, and the enzymatic cycle is complete. in this work, we created na systems with varying glycan conformations, and also without the presence of glycans. these glycans are capable of covering much of the surface area of na. their conformational flexibility is dependent on their glycan type, not necessarily their spatial position. the glycosylated systems showed moderate inhibition of ligands to the primary binding site. finally, we propose a new binding mechanism for multivalent binders to na, such as cell surface receptors. these results have implications for future drug development, the overall understanding of glycans, and the na enzymatic mechanism. much sustained effort has gone into developing na inhibitors, and will continue to do so in the future. measuring the binding of a potential drug is an important step in the drug discovery process. however, most drug discovery efforts have not taken into account viral glycans. neglecting this effect can lead to a surprising drop in drug binding ( ) . our work shows that glycans can have an inhibitory effect on influenza na primary site binding. there have already been a number of studies using multivalent binders as na antivirals ( ) ( ) ( ) ( ) ( ) ( ) . with the results shown here, we recommend future work on multivalent na drugs, to focus on developing strong binders with a long lifetime, regardless of the presence or absence of glycans. with the detection limitations of our study, we cannot conclude how glycans affect secondary site binding, although we believe binding to the secondary site will be slower than binding to the primary site ( figure ) . however, it follows from these results that glycans could evoke a secondary site binding inhibition similar to the primary site. in summary, this work examines glycan inhibition on drug binding, compares the drug binding interplay between two binding sites, and proposes a new mechanism of ligand binding to na. antibody determinants of influenza immunity exploitation of glycosylation in enveloped virus pathobiology effect of the addition of oligosaccharides on the biological activities and antigenicity of influenza a/h n virus hemagglutinin effects of glycosylation on the properties and functions of influenza virus hemagglutinin fitness costs limit influenza a virus hemagglutinin glycosylation as an immune evasion strategy glycosylation as a target for recognition of influenza viruses by the innate immune system playing hide and seek how glycosylation of the influenza virus hemagglutinin can modulate the immune response to infection contemporary h n influenza viruses have a glycosylation site that alters binding of antibodies elicited by egg-adapted vaccine strains cellular glycosylation affects herceptin binding and sensitivity of breast cancer cells to doxorubicin and growth factors multisystem failure: the story of anti-influenza drugs drug resistance in influenza a virus: the epidemiology and management current advances in anti-influenza therapy mammalian protein glycosylation -structure versus function structural requirements of n-glycosylation of proteins structure and immune recognition of the hiv glycan shield site-specific glycan analysis of the sars-cov- spike vulnerabilities in coronavirus glycan shields despite extensive glycosylation arenavirus glycan shield promotes neutralizing antibody evasion and protracted infection hepatitis c virus envelope glycoprotein e glycans modulate entry, cd binding, and neutralization structure of the epstein-barr virus major envelope glycoprotein glycan shield and epitope masking of a coronavirus spike protein observed by cryo-electron microscopy the hiv glycan shield as a target for broadly neutralizing antibodies bitter-sweet symphony: glycan-lectin interactions in virus biology effect of addition of new oligosaccharide chains to the globular head of influenza a/h n virus haemagglutinin on the intracellular transport and biological activities of the molecule antigenic structure of the haemagglutinin of human influenza a/h n virus genetic requirement for hemagglutinin glycosylation and its implications for influenza a h n virus evolution n-glycan profiles in h n avian influenza viruses from chicken eggs and human embryonic lung fibroblast cells integrated omics and computational glycobiology reveal structural basis for influenza a virus glycan microheterogeneity and host interactions targeted n-linked glycosylation analysis of h n influenza hemagglutinin by selective sample preparation and liquid chromatography/tandem mass spectrometry glycan analysis in cell culture-based influenza vaccine production: influence of host cell line and virus strain on the glycosylation pattern of viral hemagglutinin comparative glycomics analysis of influenza hemagglutinin (h n ) produced in vaccine relevant cell platforms glycosylation characterization of an influenza h n hemagglutinin series with engineered glycosylation patterns: implications for structure-function relationships characterization of site-specific glycosylation in influenza a virus hemagglutinin produced by spodoptera frugiperda insect cell line comparative characterization of the glycosylation profiles of an influenza hemagglutinin produced in plant and insect hosts changing selective pressure during antigenic changes in human influenza h human influenza a virus hemagglutinin glycan evolution follows a temporal pattern to a glycan limit glycosylation site alteration in the evolution of influenza a (h n ) viruses antigenic drift of the influenza a(h n )pdm virus neuraminidase results in reduced effectiveness of a/california/ / (h n pdm )-specific antibodies glycosylation at asn of h n haemagglutinin affects binding to glycan receptors recent avian h n viruses exhibit increased propensity for acquiring human receptor specificity glycans on influenza hemagglutinin affect receptor binding and immune response structural basis for influence of viral glycans on ligand binding by influenza hemagglutinin n-linked glycosylation of the hemagglutinin protein influences virulence and antigenicity of the pandemic and seasonal h n influenza a viruses influenza h n a/solomon island/ / virus receptor binding specificity correlates with virus pathogenicity, antigenicity, and immunogenicity in ferrets alterations in receptor binding properties of recent human influenza h n viruses are associated with reduced natural killer cell lysis of infected cells receptor binding by influenza virus: using computational techniques to extend structural data molecular simulations of diffusion and association in multimacromolecular systems bimolecular diffusion association brownian dynamics with hydrodynamic interactions role of secondary sialic acid binding sites in influenza n neuraminidase hemagglutinin/neuraminidase functional balance reveals the neuraminidase secondary site as a novel anti-influenza target multiscale simulation of receptor-drug association kinetics: application to neuraminidase inhibitors free energy decomposition of protein-protein interactions the structure of h n avian influenza neuraminidase suggests new opportunities for drug design influenza virus sialidase: effect of calcium on steady-state kinetic parameters the pymol molecular graphics system crystal structures of oseltamivir-resistant influenza virus neuraminidase mutants pandemic influenza virus: resistance of the i r neuraminidase mutant explained by kinetic and structural analysis propka : consistent treatment of internal and surface residues in empirical pka predictions pdb pqr: an automated pipeline for the setup of poisson-boltzmann electrostatics calculations how well does a restrained electrostatic potential (resp) model perform in calculating conformational energies of organic and biological molecules glycam : a generalizable biomolecular force field glycosciences.db: an annotated data collection linking glycomics and proteomics data reaching biological timescales with all-atom molecular dynamics simulations conformational analysis of furanoside-containing mono-and oligosaccharides charmm gui: a web based graphical user interface for charmm charmm: the biomolecular simulation program charmm-gui input generator for namd, gromacs, amber, openmm, and charmm/openmm simulations using the charmm additive force field glycan reader: automated sugar identification and simulation preparation for carbohydrates and glycoproteins glycan reader is improved to recognize most sugar types and chemical modifications in the protein data bank charmm-gui glycan modeler for modeling and simulation of carbohydrates and glycoconjugates comparison of simple potential functions for simulating liquid water numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamcis of n-alkanes langevin dynamics of peptides: the frictional dependence of isomerization rates of n acetylalanyl n ′ methylamide an analysis of the accuracy of langevin and molecular dynamics algorithms isothermal-isobaric molecular dynamics simulations with monte carlo volume sampling particle mesh ewald: an n⋅log(n) method for ewald sums in large systems gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers distinct glycan topology for avian and human sialopentasaccharide receptor analogues upon binding different hemagglutinins: a molecular dynamics perspective structural evidence for a second sialic acid binding site in avian influenza neuraminidases prodrg: a tool for high-throughput crystallography of protein-ligand complexes browndye: a software package for bronwian dynamics studier i affiniteten. cm forhandlinger: videnskabs-selskabet i christiana Über die chemische affinität ion-water interaction potentials derived from free energy perturbation simulations vmd: visual molecular dynamics improvements to the apbs biomolecular solvation software suite structure of influenza virus n : the last piece of the neuraminidase a secondary sialic acid binding site on influenza virus neuraminidase: fact or fiction? role of neuraminidase in influenza a(h n ) virus receptor binding antibody neutralization and escape by hiv- antibody evasion by a gammaherpesvirus o-glycan shield the hepatitis c virus glycan shield and evasion of the humoral immune response beyond shielding: the roles of glycans in the sars-cov- spike protein structure of an hiv gp envelope glycoprotein in complex with the cd receptor and a neutralizing human antibody the hiv- envelope glycoproteins: fusogens, antigens, and immunogens structural, glycosylation and antigenic variation between novel coronavirus ( -ncov) and sars coronavirus effect of bisecting glcnac and core fucosylation on conformational properties of biantennary complex-type n-glycans in solution glycan flexibility: insights into nanosecond dynamics from a microsecond molecular dynamics simulation explaining an unusual nuclear overhauser effect conformational flexibility of nglycans in solution studied by remd simulations sequence-to-structure dependence of isolated igg fc complex biantennary n-glycans: a molecular dynamics study regulation of receptor binding affinity of influenza virus hemagglutinin by its carbohydrate moiety the nd sialic acidbinding site of influenza a virus neuraminidase is an important determinant of the hemagglutinin-neuraminidase-receptor balance influenza virus-glycan interactions functional significance of the hemadsorption activity of influenza virus neuraminidase and its alteration in pandemic viruses mutation of the second sialic acid-binding site, resulting in reduced neuraminidase activity, preceded the emergence of h n influenza a virus mesoscale all-atom influenza virus simulations suggest new substrate binding mechanism substrate binding by the second sialic acid-binding site of influenza a virus n neuraminidase contributes to enzymatic activity n neuraminidase of influenza virus a/fpv/rostock/ has haemadsorbing activity neuraminidase hemadsorption activity, conserved in avian influenza a viruses, does not influence viral replication in ducks antigenic structure and variation in an influenza virus n neuraminidase synthesis and anti-influenza evaluation of polyvalent sialidase inhibitors bearing -guanidino-neu ac en derivatives dimeric zanamivir conjugates with various linking groups are potent, long-lasting inhibitors of influenza neuraminidase including h n avian influenza attaching zanamivir to a polymer markedly enhances its activity against drug-resistant strains of influenza a virus polymerattached zanamivir inhibits synergistically both early and late stages of influenza virus infection synthesis of multivalent difluorinated zanamivir analogs as potent antiviral inhibitors multivalent zanamivir-bovine serum albumin conjugate as a potent influenza neuraminidase inhibitor key: cord- -ev acl b authors: lam, ha minh; ratmann, oliver; boni, maciej f title: improved algorithmic complexity for the seq recombination detection algorithm date: - - journal: mol biol evol doi: . /molbev/msx sha: doc_id: cord_uid: ev acl b identifying recombinant sequences in an era of large genomic databases is challenging as it requires an efficient algorithm to identify candidate recombinants and parents, as well as appropriate statistical methods to correct for the large number of comparisons performed. in , a computation was introduced for an exact nonparametric mosaicism statistic that gave high-precision p values for putative recombinants. this exact computation meant that multiple-comparisons corrected p values also had high precision, which is crucial when performing millions or billions of tests in large databases. here, we introduce an improvement to the algorithmic complexity of this computation from o(mn( )) to o(mn( )), where m and n are the numbers of recombination-informative sites in the candidate recombinant. this new computation allows for recombination analysis to be performed in alignments with thousands of polymorphic sites. benchmark runs are presented on viral genome sequence alignments, new features are introduced, and applications outside recombination analysis are discussed. determining whether genomic regions are undergoing homologous recombination is important in all parts of biology and genetics. indeed, recombination has profound consequences for a population's evolutionary trajectory, and it changes our understanding of the evolutionary history of a population as described through phylogenetics (schierup and hein ; . identifying recombination is especially important in large genomic analyses, as the larger the region being analyzed the higher the chance that recombination will be detected even in a small sample. over the past three decades, methods of identifying recombination from sequence data have focused on detection of clustered polymorphism, excessive homoplasy, low linkage disequilibrium, mosaicism, and incongruent phylogenies . some of these statistical signals have advantages over others in terms of false positive rate, statistical power, speed, and the size of the data set that can be analyzed. an analysis of sensitivity and specificity can be found in posada and crandall (posada and crandall ) and a guide to choosing an appropriate method for a given data set can be found in martin et al. ( ) . in modern sequence analysis, a major challenge in recombination detection is the size of the data sets themselves. beyond the computational burden, critical but often underappreciated statistical issues arise through the extremely large number of compared nucleotide sequence patterns. with this many comparisons being performed, truly nonrecombinant sequences can exhibit nucleotide patterns that appear recombinant by chance. for this reason, statistical corrections for multiple comparisons are essential to guard against calling spurious recombinants. in an algorithm called seq, boni et al. ( ) presented an exact mosaicism statistic for calling recombinants. critically, the exactness of the computation (e.g., calculating p values to a precision of À or À ) allows these mosaic signals to remain statistically significant, even when billions of comparisons are being performed and adjusted for multiple comparison. this means that the exact mosaicism statistic implemented in the seq software maintains good power properties even on large data sets when statistical correction factors for multiple comparisons are on the order of or more. recombination detection methods that detect mosaic signals always take a triplet approach or a quartet approach, positing one sequence as the candidate recombinant, two sequences as the parents, and possibly a fourth sequence as an outgroup. with the parental sequences labeled p and q and the candidate recombinant labeled c, these methods normally use "recombination informative" sites, or simply informative sites, to determine if c is a mosaic of p and q. in seq, nucleotide positions on c are labeled informative if the nucleotide in c is identical to one parental sequence but different from the other. if the sequence of m informative sites identical to p and n informative sites identical to q appears nonrandom or clustered, this is an indication that letter ß the author . published by oxford university press on behalf of the society for molecular biology and evolution. this is an open access article distributed under the terms of the creative commons attribution non-commercial license (http://creativecommons.org/licenses/by-nc/ . /), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. for commercial re-use, please contact journals.permissions@oup.com open access recombination may have occurred. when read from left to right along the sequence, the informative sites can be used to draw a random walk on a set of axes with m up-steps and n down-steps; this is called a hypergeometric random walk (hgrw). a strong descent or ascent in the middle of a hgrw indicates that one type of informative site exhibits clustering, and the properties of the random walk can be used to compute exact probabilities of this occurring. see figure for an example. in this letter, we present a new and faster method of computing these probabilities. the central feature of seq was a reduction of an o( mþn ) space-complexity problem into an o(mn ) problem, for computing the probability x m, n, k that a hgrw with m up-steps and n down-steps achieves a maximum descent of size k exactly. the descent does not need to be k consecutive down steps. the computations were done via auxiliary variables y m, n, k, j : the probability that a hgrw with m up-steps and n down-steps achieves a maximum descent of size k exactly and the minimum value achieved by the random walk is exactly j units below the origin. the y-variables can be computed recursively (boni et al. ) by building a table of size mn . the x-variables are then computed as follows: by separating out the first and last term in the sum above, and using the y-variable recursions, a nearly direct recursion can be written for the x-variables: the p value for observing a maximum descent of size at least k is defined by and recursions for the p-variables reduce to: the y-variables in equation ( ) above-since the last two indices are equal-can be computed recursively by building one table of size mn . the p-variables can be recursively computed by building a second table of size mn . this means that the entire computational procedure of p values can be done with space complexity o(mn ) instead of the original o(mn ) presented in boni et al. ( ) . all computations were verified against the original approach. this new approach allows larger probability tables to be built more quickly. using the recursions, a table of size   was built in h and used . -gb ram ( . ghz processor; -gb ram). using the recursions above, a table of ,  ,  , was built in h and min in a . -gb memory footprint. two other noteworthy improvements were made to the algorithm: ) faster breakpoint calculations by using polymorphic sites only in relationship between ordering of informative sites along a genome and a hypergeometric random walk. below each set of axes, the red bars and blue bars show positions on a genome (informative sites) where a putative recombinant sequence is identical to parent p but different from parent q (blue bars), or identical to parent q but different from parent p (red bars). each blue site can be mapped to an up-step in a random walk and each red site can be mapped to a down-step in a random walk, and there is a one-to-one correspondence between the space of informative-site arrangements and the space of hypergeometric random walks. (a) a random arrangement of informative sites, which does not visually suggest that the sequence is a mosaic of putative parents p and q. the arrangement of sites maps to a random walk which stays fairly close to the horizontal axis. this walk's maximum descent is eight steps, and $ % of hgrws with up-steps and down-steps have a maximum descent of eight steps or greater. (b) a nonrandom arrangement of informative sites that clearly suggests that the candidate sequence is a mosaic of the two parental sequences p and q. the probability of all the red sites appearing consecutively is !  !/ ! which is .  À . (c) an arrangement of red sites and blue sites that suggests the red sites may be clustered in the middle. when mapping the site arrangement to a hypergeometric random walk, the random walk has a maximum descent of steps. the p value for a maximum descent of steps cannot be written down in closed form but can be calculated from recursion ( ). the p value for this maximum descent and for this arrangement of informative sites is .  À . lam et al. . doi: . /molbev/msx mbe breakpoint searches, and ) a repeated subsampling feature that allows for comparison of data sets of different sizes; with this feature one can randomly subsample m sequences from multiple databases or sequence collections, and repeat the process to see how often these subsets exhibit recombination. the new source code and manual can be downloaded from http://mol.ax/ seq. when a p value falls outside the bounds of the table being used, the software substitutes in hogan-siegmund approximations (hogan and siegmund ) for the queried p value. the seq maximum descent statistic describes clustering patterns in sequences of binary outcomes, and is therefore not confined to recombination analysis. the statistic can be viewed as a generalization of the mann-whitney u statistic, in the sense that outcomes of one type (of a binary outcome variable) do not necessarily have to cluster or rank at the beginning or end of a sequence of data points. the maximum descent of a hgrw can be used to describe the clustering of one particular binary outcome in the middle of a sequence of binary outcomes; in other words, it is a d nonparametric clustering statistic. in recombination analysis, this is the clustering of one kind of informative site among all the informative sites (han et al. ) . to make use of this statistic easier for those working outside the field of recombination, we developed a web calculator ( fig. ) that computes exact p values for clustering in a sequence of binary outcomes, available at http://mol.ax/delta. for example, the sequence "aaaaabb bbabbbabbbaaaa" can be typed in and the calculator reports that the clustering of bs in the middle of the sequence is significant at p ¼ . . we list two practical example uses of our nonparametric clustering statistic. first, seasonality can be assessed nonparametrically. if a particular population behavior or climatic characteristic (e.g., rain or no rain) can be noted to occur or not occur every day, then an ordered sequence of the days in the year will show if the occurrence of one of the behaviors is clustered and thus if this feature was seasonal in that one year. as a second example, when a process is expected to behave at an intermediate range or when an observation is expected to be made at intermediate values only, this pattern can be tested for nonparametrically. dengue virus does not cause severity for all ages equally. one's first dengue infection, occurring during childhood, is typically nonsevere; secondary infections, seen in older children and teenagers, have a higher chance of severity, whereas tertiary and subsequent infections, those that would occur in older age groups, are thought to be rare and/or subclinical (gubler ; wikramaratna et al. ) . thus, disease severity in a surveillance system should be seen in the intermediate age ranges, and this can be tested for nonparametrically by noting if each age band is overrepresented or underrepresented in the pool of patients experiencing dengue-like severe disease in a hospital. in fact, since all that is required here is a symptoms description, the identification of a vulnerable age range can be done for any set of symptoms. to illustrate improved runtimes and memory usage of the new seq algorithm, we searched for recombinants among large sequence data sets of dengue virus serotype , ebola virus, the coronavirus responsible for middle-east respiratory syndrome (mers) and zika virus; see table . full-length zika virus sequences were downloaded from the ncbi viral variation resource (brister et al. ) and aligned with muscle v . (edgar ) . full-length sequences of ebola virus, dengue virus serotype , and the coronavirus responsible for middle-east respiratory syndrome (mers) were downloaded from ncbi and aligned with the online ncbi alignment tools. ebola virus sequences were restricted to human viruses sampled in africa after december , . dengue virus serotype was chosen to include a particularly large and polymorphic alignment. as negative controls, we considered segments pb and ns from avian influenza a virus, subtype h n , originally analyzed in boni et al. ( ) ; only sequences from the influenza genome sequencing project were included fig. . screenshot of new online tool that can be used to calculate p values testing the hypothesis of whether one binary outcome clusters in the middle of a ( d) sequence of binary outcomes. one input method is simply typing two characters in a text box (above, "u" for up and "d" for down) and letting the calculator return a p value showing whether one type of character is clustered in the middle. to test whether the other type of character is clustered, the "swap" button can be used. the hypergeometric walk is shown graphically. the exact p value, computed with the methods in this article, is shown. the two hogan-siegmund approximations for this p value are also shown. improved algorithmic complexity for seq . doi: . /molbev/msx mbe (ghedin et al. ) and identical sequences were removed (when identical sequences were not removed, results using the new version of seq were identical to the results in table of boni et al. ) . the new version of the software-run with a p value table of size ,  ,  , -had faster computation times than the previous version and was able to comfortably accommodate alignments with thousands of polymorphic sites. table shows the results of all runs. note that because seq evaluates all triplets in a data set, the run time of the algorithm scales as the cube of the number of sequences and linearly with the alignment length. as informative sites can sometimes be clustered in short regions of the genome, seq will report these short segments as recombinant. for this reason, an additional column is included in table showing the number of sequences that were identified as recombinant with both inherited regions being longer than nt; if one of the recombinant regions is very short, it is difficult to confirm the recombination results with a phylogenetic analysis of the two identified parental segments. starting with the analysis on the two negative control data sets, no recombinant segments longer than nt were detected in either avian influenza alignment. both of these runs took < s. the genomic alignments of mers and zika virus contained , and , polymorphic sites, respectively, and > . % triplets were able to be tested for mosaicism with exact p values. these runs took < min. as expected from a recent analysis by dudas and rambaut (dudas and rambaut ) , the mers sequence data set was highly recombinant, with out of sequences being identified as such. for zika, out of virus sequences were identified as recombinant, consistent with earlier analyses supporting the presence of recombination in the evolutionary history of zika (faye et al. ; zhu et al. ) ; details of the recombinants, parents, and breakpoints are included in the supplementary material online. the ebola virus and dengue virus alignments each contained around , sequences. the ebola virus data showed no evidence of recombination. the dengue alignment was the most diverse of all the tested data sets with , polymorphic sites; . % of the triplets in this data set were able to be evaluated with exact p values. a total of out of , dengue sequences were identified as recombinant (see supplementary material online). several previous analyses of dengue virus have shown evidence for intraserotype recombination in dengue worobey et al. ; uzcategui et al. ; aaskov et al. ; waman et al. waman et al. , . the results presented here, as well as those of waman et al. ( ) , suggest that recombination in dengue is infrequent. in general, when recombinants are identified by a mosaicism statistic like the one used by seq, a phylogenetic analysis should be performed to ensure that the recombination signal is preserved when the entire evolutionary history of the sample is taken into account. the size of modern data sets presents two challenges here. first, as the number of available sequences increases, the choice for phylogenetic inference tools drifts to more approximate methods, as thorough explorations of tree space become computationally expensive for large numbers of sequences. this reduces our confidence in phylogenetic incongruence signals that we observe in these data. second, genome-level analyses in highly recombining organisms are likely to result in a subdivision of the genome into many nonrecombinant blocks. inferring phylogenies for all blocks individually will be computationally expensive, as will the subsequent analysis of identifying specific phylogenetic incongruences among the trees. the next generation of recombination detection methods should focus on these computational challenges. supplementary data are available at molecular biology and evolution online. multiple recombinant dengue type viruses in an isolate from a dengue patient guidelines for identifying homologous recombination events in influenza a virus an exact nonparametric method for inferring mosaic structure in sequence triplets virus variation resource -recent updates and future directions mers-cov recombination: implications about the reservoir and potential for adaptation muscle: multiple sequence alignment with high accuracy and high throughput molecular evolution of zika virus during its emergence in the th century large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution dengue and dengue hemorrhagic fever no observed effect of homologous recombination on influenza c virus evolution large deviations for the maxima of some random fields phylogenetic evidence for recombination in dengue virus analysing recombination in nucleotide sequences evaluation of methods for detecting recombination from dna sequences: computer simulations the effect of recombination on the accuracy of phylogeny estimation recombination in evolutionary genomics consequences of recombination on traditional phylogenetic analysis molecular epidemiology of dengue type virus in venezuela: evidence for in situ virus evolution and recombination genetic diversity and evolution of dengue virus serotype : a comparative genomics study population genomics of dengue virus serotype : insights into genetic structure and evolution the effects of tertiary and quaternary infections on the epidemiology of dengue widespread intra-serotype recombination in natural populations of dengue virus comparative genomic analysis of pre-epidemic and epidemic zika virus strains for virological factors potentially associated with the rapidly expanding epidemic key: cord- -rowoylge authors: zhang, wenjuan; zhang, yuan; zhong, yang title: using maximum likelihood method to detect adaptive evolution of hcv envelope protein-coding genes date: journal: chin sci bull doi: . /s - - - sha: doc_id: cord_uid: rowoylge nonsynonymous-synonymous substitution rate ratio (d (n)/d (s)) is an important measure for evaluating selective pressure based on the protein-coding sequences. maximum likelihood (ml) method with codon-substitution models is a powerful statistic tool for detecting amino acid sites under positive selection and adaptive evolution. we analyzed the hepatitis c virus (hcv) envelope protein-coding sequences from general geno/subtypes worldwide, and found amino acid sites under positive selection. since these sites are located in different immune epitopes, it is reasonable to anticipate that our study would have potential values in biomedicine. it also suggests that the ml method is an effective way to detect adaptive evolution in virus proteins with relatively high genetic diversity. the basic process of adaptive evolution by natural selection is the replacement of one allele gene by another with a higher fitness in a population. detecting the adaptive evolution would be helpful for better understanding of bio-evolutionary mechanism and corresponding variation in structure and function [ ] . the nonsynonymous-synonymous substitution rate ratio (d n /d s ) is an important indicator of selective pressure at the protein-coding gene, with d n /d s = meaning neutral mutation, d n /d s < purifying selection, and d n /d s > diversifying positive selection (i.e. adaptive evolution). in comparison with a large amount of neutral mutations and purifying selections, positive selection is rare and hardly detected effectively because it often just occurs on a few of sites or during a short period [ ] . in particular, for some protein-coding sequences with high genetic diversity that might result from relatively high mutation rates or long evolutionary history, it is much difficult to infer whether positive selection exists. for example, hepatitis c virus (hcv) is a type of rna viruses with high mutation rates, and its genotypes have emerged in determining the clinic variation, main features of chronic infection and duration of antiviral therapy. although the rapid variation of hcv has attracted the attention of virologists and evolution biologists, it is still unclear how hcv evades the host immune response and the mechanism of chronic infection [ ] . envelope glycoproteins e and e of hcv are involved in virus attaching to the host cell as well as in virus endocytosis and fusion with host membrane. e protein contains two highly variable regions called hypervariable regions and (hvr and hvr ) and two cd -binding sites. as one of the receptors of e protein [ ] , cd is a bridge between virus and host cell. hvr is implicated in the scarb -mediated cell entry. it was reported that despite strong amino acid sequence variability related to strong pressures towards change, the chemicophysical properties and conformation of hvr were highly conserved. the conservation of positively charged residues located at specific sequence positions of hvr indicates that hvr is involved in interactions with negatively charged molecules on host cell surface. this possible interaction probably plays a role in host cell recognition and attachment [ ] . hvr and cd -binding sites may be involved in sensitivity and/or resistance to ifn-alpha therapy. it is therefore considered that hcv evades the host immune response through mutation in some amino acid sites of envelope proteins, which result in recognition error during hcv contacting host cell. these mutations will be fixed under pressure driven by host immune system environment and form the adaptive evolution of hcv genomes. previous studies focused on exploration of the adaptive evolution within an individual hcv subtype. for example, positively selected amino acid sites in the entire coding sequences of hcv subtype b were identified [ ] . the increasing availability of data storing in hcv databases allows us to analyze hcv genome evolution on a large scale of genetic diversity from more quantitative frameworks based on statistical inference [ , ] . technologically, a number of advanced methods have been proposed to reconstruct ancestral sequences or estimate the parameters under different substitution models when calculating nonsynonymous-synonymous rate ratio. for example, when all the sites evolve independently in one substitution model, a parsimony approach can be used to infer ancestral sequences and compute substitution numbers of different types. however, estimation of parameters by the parsimony approach may be biased because it does not account for multiple substitutions on one site. the maximum likelihood (ml) method is another way to estimate model parameters and more strict. it is employed not only to select the best phylogenic tree that fit the real data [ ] , but also to detect select pressure on sites under different codon substitution models, especially for amino acid sites undergoing positive selection [ ] . the purpose of this study is therefore to use the ml method [ ] to infer adaptive evolution and positively selected amino acid sites of hcv envelope protein entire coding sequences containing all hcv genotypes. the scientists that expert in the fields of hcv genetic variability and development of hcv sequence databases (such as the hepatitis virus database (japan), euhcvdb (france) [ ] , and los alamos (united states) [ ] ) meet to re-examine the status of hcv genotype nomenclature. hcv variants can be classified into genotypes representing the genetic groups defined by phylogenetic analysis [ ] . the confirmed genotypes with complete hcv genome sequences ( subtypes) were defined according to the nomenclature stipulated in heidelberg conference [ , ] . the proposal provides the framework by which the hcv databases store and provide access to data on hcv. considering the computational workload of adaptive evolution detection and statistical significance of the data analysis, this study used the sequences of these complete annotated hcv genomes (table ) [ , ] . the average numbers of amino acid sites for e and e were and , respectively. amino acid sequences of e and e proteins were aligned using clustal x . [ ] [ ] [ ] . the nucleic acid sequences were aligned according to the protein alignments with tranalign program in the embosswin software package [ ] . these nucleic acid sequences table the entire coding sequences of hcv used in this study a) genotype genbank accession no. a) classification of genotypes is available at http://euhcvdb.ibcp. fr/euhcvdb. were retrieved from the genbank database (release . ). in order to infer a reliable phylogeny of hcv geno/subtypes, we used the alignment result of complete hcv polyprotein coding sequences for evolution tree reconstruction. neighbor-joining method with kimura- parameter model implemented in mega . was used for phylogenetic analysis [ ] [ ] [ ] . clade robustness was measured by bootstrap method with replicates. nonsynonymous-synonymous substitution rate ratio (ω = d n /d s ) was calculated by site-specific models of codon substitution models according to the results of phylogeny tree and sequences alignment. an ω significantly greater than means that the nonsynonmous mutations are fixed at a higher rate than synonymous mutations and the evolution of this site is driven by positive selection. the model with maximum likelihood ratio is considered as the best model to fit the data. the likelihood-ratio test (lrt) was used to compare twice the log-likelihood differences between two nested models and with a χ distribution to identify the statistics significance. the degrees of freedom (df) used in lrt were equal to the difference in the number of parameters between the two models [ ] . we used three pairs of models to form three lrts: m (one-ratio) and m (discrete), m a (nearly neutral) and m a (positive selection), and m (β ) and m (β & ω ). the simplest model, m , assumes one ω for all sites. model m a (nearly neutral) allows two classes of sites with < ω < and ω = in proportions p of conserved sites and p = −p of neutral sites, respectively. based on m a, m a (positive selection) adds an additional class of sites with frequency p = −p −p and an ω estimated from the data. m (discrete) uses an unconstrained discrete distribution to model heterogeneous ω ratios among sites. m (β ) assumes a β (p,q)-distribution for ≤ω≤ . m (β & ω ) adds to m an extra category, with proportion p of sites with ω , while the rest of sites (at frequency p = −p ) have ω from the β (p,q)-distribution between and . this model can be compared with m to test the presence of positive sites using a likelihood-ratio test (lrt) [ ] . in this study, site-specific models were used with codeml in the paml . b package [ ] . we tested positive selection over sites of coding sequences by comparing twice the log-likelihood differences between m a vs. m a and m vs. m with a χ distribution in the lrt. phylogenic tree shown in fig. was consistent with the phylogeny analysis based on complete genome sequences published previously [ ] and the one available in http://hcv.lanl.gov/content/hcv-db/distances/hcv_vari ability.html. the results of identifying positively selected amino acid sites in the coding region of e are summarized in table . the lrts of adaptive evolution suggested that the model of one ω ratio for all sites (m ) was rejected when compared with model m ( δl = . , p< . , df = ). the lrt statistic for comparing m a (nearly neutral) and m a (positive selection) showed that m a did not have precedence over m a ( δl = , df = ). indeed, model m a and m a had the same likelihood value and the estimations of parameters under these models were similar. in m a, p and p could be combined into one because ω = . in this way, m a was equivalent to m a. therefore, we could not infer positive selection from this comparison. model m was significantly prior to m ( δl = . , p< . , df = ). model m provided three proportions of sites, p , p and p with ω ratio of . , . , and . respectively. it suggested a large proportion of sites (~ %) under strong purifying selection. another piece of evidence for e gene being negatively selected was that ω in m was no greater than . models that allow for positively selected sites are m a and m in the three pairs of nested models. however, neither of these two models suggested the existence of sites of e protein under positive darwinian selection. the results of identifying positively selected amino acid sites in the coding region of e are also summarized in table with ω values between and . however, different from those in e protein, positively selected sites with ω ratios greater than were detected in m a (ω = . ) and m (ω = . ). these sites were e, n, a and t which all located in hvr of e protein. to assess the potential impact of the adaptive mutations, sites of e protein under positive selection were mapped onto immune epitope against hcv based on the epitope maps from hcv immunology database (http://hcv.lanl.gov/content/immuno/immuno-main.htm l) [ ] (table ). all of the amino acid sites under adaptive evolution were located in b-cell epitopes of rat. only one site ( t) was found in t-cell epitopes of human and transgenic mouse. two a and t were also located in t-helper epitopes of human. it probably suggested that humoral immune response plays a key role in the immune clearance and exert more selective pressure on hcv replication than cell mediated response. detecting adaptive evolution is a bioinformatics exploration based on the knowledge of genetics and statistics. d n and d s as well as their ratio ω which measures the selective pressure at the amino acid level provide powerful tools for better understanding of the effect of natural selection on molecular evolution. an ω significantly greater than means that nonsynonymous mutations offer fitness advantages and this lineage (in lineage-specific models) or this critical amino acid site in the protein (in the site-specific models) are considered under positive selection driven by environment. though ω ratio is a sensitive measure of positive selection, both lineage-specific models and site-specific models may lack power in detecting positive selection if adaptive evolution occurs at a few time points and affects a few amino acids. we need more robust statistic tools to test the hypothesis models [ , ] . maximum likelihood method and lrt could help to identify the best codon substitution models to fit the real data, and some models such as m a and m have been successfully used for detecting positive selection. we took hcv envelope glycoprotein as an example to explore the adaptive evolution driven by immune environment pressure of coding sequences of hcv containing geno/subtypes and found that a number of amino acid sites were under positive selection and ml could be employed for identifying the adaptive evolution of rna virus on a large scale of genetic diversity. brown et al. [ ] cloned hcv e e full-length nucleotide sequences generated from serum samples of chronically infected patients and identified amino acid sites undergoing patient-specific adaptive evolution. in this study, we detected amino acids sites of e protein under positive selection. two of these sites were proved in brown's work. note that a region including the n-terminal - amino acid sites in e is known to be the most variable and is called hypervariable region (hvr ) [ , ] . this region is surface-exposed [ ] and has been proposed as a major target of the immune response probably because its hypervariable is correlated with immune evasion [ ] [ ] [ ] . for all of the positively selected amino acid sites located in hvr of e protein and in some immune epitopes, adaptive evolution of hcv could be the consequence of the environment pressure directly driven by host immune response. recent studies revealed more information about how hcv escaping from host immune system response, but more comprehensive and careful research should be done to make clear the role of immune evasion in hcv chronically infection and explain the mechanism of hcv evolution involving immunology and virology. in other words, the positively selected sites' location indicated the immunogenicity of these sites and they might be candidate vaccination targets against hcv. the composite vaccines containing these different amino acid residues at the positively selected sites located in immune epitopes would be effective to preventing proliferation of escape mutants [ ] . no amino acid sites exhibited positive selection within e protein in this study. it was consistent with the report from brown et al. [ ] . the possible reason was that e was unlikely to be surface-exposed [ ] and not a major target for the host antibody response. it was reported that e protein was a poor natural immunogen for humoral response [ ] . in other words, e protein was not under strong selective pressure of adaptive evolution driven by immune response. suzuki and gojobori [ ] identified positively selected amino acid sites in the entire coding region of hcv subtype b by parsimony method. four of these sites were located in e and three located in e . it is different from the results obtained from this study. the possible reasons may be: ( ) the strategies to reconstruct ancestral sequences are different. adaptsite, the program employed in suzuki and gojobori's work, uses maximum parsimony method to perform reconstruction while codeml uses a likelihood reconstruction. thus, the reconstructed ancestral states may be different. in general, the two implementations produce similar results in dataset with high similarity among sequences. however, ml provides a more reliable result when used to analyze small-size dataset with relatively high diversity in sequence similarity [ , ] . they focused on hcv subtype b [ , ] , and deleted any sequences with gap by pairwise-alignment with reference sequence (hcv-js) to obtain dataset with highly similar sequences. our study used sequences containing hcv geno/subtypes with a relatively large scale of genetic diversity; thus the ml method was adopted; ( ) the methods to estimate branch length are different. adaptsite uses a neighbor-joining algorithm to estimate branch length [ ] while codeml uses a codon model m to do it; ( ) for codons that are neighbors of stop codons, adaptsite and codeml count sites differently, e.g. tac and tat, adaptsite counts s = and n = , while codeml gives . and . ; ( ) missing data are handled differently. adaptsite requires dataset without gaps [ ] while codeml implemented in this work allows sequences data with some gaps; and ( ) mutation rates in rna viruses are several orders of magnitude higher than those in dna based life-forms. by limiting genome size and content, rna virus genome can avoid deleterious mutation accumulation. it brings virus genomes to be inclined to concerted evolution or parallel evolution and own similar codon substitution patterns [ ] . therefore, it is not difficult to understand why most of amino acid sites undergoing purifying selection of genetic constraints though hcv envelope proteins still possess high mutation rates. our study focused on a larger scale of genetic diversity than previous work. this study might probably miss particular positively selected amino acid sites of individual subtype but produce more general sites under positive darwinian selection to all hcv genotypes. the parsimony method of suzuki and gojobori [ ] and the maximum likelihood method developed by nielsen and yang [ ] are two widely used methods for detecting natural selection in homologous protein-coding sequences. however, they have their own pros and cons. the former may fail to infer positively selected sites when the branches of the phylogenetic tree are long because the maximum parsimony method is not fit for multiple substitutions. in contrast, multiple substitutions in the nielsen and yang method may be corrected by assuming the codon substitution model. suzuki also attempted to employ ml to modify previous method [ ] . our study showed an application of ml method in detecting adaptive evolution in hcv envelope protein-coding sequences based on geo/ subtypes. it provided an instance for similar work of high diversity homologous genes analysis. indeed, using ml method to infer adaptive evolution has been an effective strategy for some emerging viruses such as sars-cov. in this way we had successfully explored the adaptive evolution of sars-cov spike protein [ ] . genetic diversity and evolution of hepatitis c virus-- years on functional hepatitis c virus envelope glycoproteins conservation of the conformation and positive charges of hepatitis c virus e envelope glycoprotein hypervariable region points to a role in cell attachment positively selected amino acid sites in the entire coding region of hepatitis c virus subtype b hcvdb: hepatitis c virus sequences database the los alamos hcv sequence database likelihood models for detecting positively selected amino acid sites and applications to the hiv- envelope gene paml: a program for package for phylogenetic analysis by maximum likelihood consensus proposals for a unified system of nomenclature of hepatitis c virus genotypes chronic hepatitis c virus infection: genotyping and its clinical role a package for performing multiple sequence alignment on a microcomputer the clustal-x windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools multiple sequence alignment with clustal x emboss: the european molecular biology open software suite the neighbor-joining method: a new method for reconstructing phylogenetic trees mega : integrated software for molecular evolutionary genetics analysis and sequence alignment a simple method for estimating evolutionary rate of base substitution through comparative studies of nucleotide sequences codon-substitution models for heterogeneous selection pressure at amino acid sites the los alamos hepatitis c immunology database statistical methods for detecting molecular adaptation evolutionary dynamics of hepatitis c virus envelope genes during chronic infection hypervariable regions in the putative glycoprotein of hepatitis c virus variable and hypervariable domains are found in the regions of hcv corresponding to the flavivirus envelope and ns proteins and the pestivirus envelope glycoproteins a model for the hepatitis c virus envelope glycoprotein e evidence for immune selection of hepatitis c virus (hcv) putative envelope glycoprotein variants: potential role in chronic hcv infections prevention of hepatitis c virus infection in chimpanzees by hyperimmune serum against the hypervariable region of the envelope protein epitope mapping of antibodies directed against hypervariable region in acute self-limiting and chronic infections due to hepatitis c virus mutational evidence for an internal fusion peptide in flavivirus envelope protein e induction of hepatitis c virus e envelope protein-specific immune response can be enhanced by mutation of n-glycosylation sites a new method of inference of ancestral nucleotide and amino acid sequences accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods a method for detecting positive selection at single amino acid sites adaptsite: detecting natural selection at single amino acid sites error thresholds and the constraints to rna virus evolution new methods for detecting positive selection at single amino acid sites reconstruction of the most recent common ancestor sequences of sars-cov s gene and detection of adaptive evolution in the spike protein key: cord- -dy dtnk authors: wang, chen; konecki, daniel m.; marciano, david c.; govindarajan, harikumar; williams, amanda m.; wastuwidyaningtyas, brigitta; bourquard, thomas; katsonis, panagiotis; lichtarge, olivier title: identification of evolutionarily stable sites across the sars-cov- proteome date: - - journal: res sq doi: . /rs. .rs- /v sha: doc_id: cord_uid: dy dtnk since the first recognized case of covid- , more than million people have been infected worldwide. despite global efforts in drug and vaccine development to fight the disease, there is currently no vaccine or drug cure for covid- , though some drugs reduce severity and hasten recovery. here we interrogate the evolutionary history of the entire sars-cov- proteome to identify functional sites that can inform the search for treatments. combining this information with the mutations observed in the current covid- outbreak, we systematically and comprehensively define evolutionarily stable sites that are useful drug targets. several experimentally-validated effective drugs interact with these proposed target sites. in addition, the same evolutionary information can prioritize cross reactive antigens that are useful in directing multi-epitope vaccine strategies to illicit broadly neutralizing immune responses to the betacoronavirus family. although the results are focused on sars-cov- , these approaches are based upon evolutionary principles and are agnostic to organism or infective agent. covid- is a worldwide a iction. since rst being reported in december in wuhan, hubei province, china, the world health organization (who) has tallied more than , covid- related deaths and over million infections worldwide (as of september th , ) ( ) . although timely public health interventions can successfully curtail incidence, the threat of subsequent waves of infections remains widespread ( ) ( ) ( ) . the novel betacoronavirus (sars-cov- ) that is causing the pandemic is closely related to other known human coronavirus pathogens sars-cov, mers-cov ( , ) , hcov oc , hku and is more distantly related to the human infectious alphacoronaviruses hcov e and hcov nl ( ) . finding ways to control and prevent further infection are top priorities which include the targeted discovery of drugs that impair viral mechanisms ( - ) and antigenic epitopes through which vaccines raise immunity ( ) ( ) ( ) . this study addresses both by utilizing evolutionary information from sars-cov- sequence and structural data to search for actionable functional sites for each protein in the sars-cov- genome. in a rst application, we note that the approval of new drugs under normal circumstances often takes more than years ( , ) . in order to hasten the response, many current clinical trials for covid- enlist antiviral agents that have targeted zika, sars-cov, ebola, and mers-cov in the past ( , ) . in order to test more varieties of potential drugs, some studies screened thousands of clinical-stage or fdaapproved small molecules for antiviral activity, hoping to repurpose some of the top hits for covid- treatment ( ) . however, the antiviral activity in these large-scale screens may, in part, be cell-line speci c ( ) , and therefore of unclear clinical relevance. another approach to screen potential drugs for repurposing is to perform docking ( ) of clinical-stage or fda-approved drugs to the sars-cov- proteome ( , ) . however, selection of the correct binding sites on the target proteins is crucial and di cult as protein surface cavities far exceed actual ligand binding sites that modulate function ( ) . here we systematically suggest potential drug target sites for most sars-cov- proteins based on evolutionary information. as these sites are chosen for their conserved functional roles, broad pancoronavirus/betacoronavirus relevance, and minimal variability across all known current sars-cov- variants, they should be prioritized in docking studies for drug repurposing. in a second application, we note that understanding the immune response to sars-cov- infection is critical for vaccine development ( ) . most early sars-cov- immune epitope discovery studies rely heavily on bioinformatic prediction tools as well as sequence and epitope work already done in sars-cov and mers-cov. b-cell linear and discontinuous epitope prediction tools have been used by researchers to identify possible sars-cov- epitopes ( ) ( ) ( ) . several more recent studies experimentally determined sars-cov- immune epitopes ( , , ) . interestingly, several groups have reported signi cant t-cell reactivity against sars-cov- epitopes in individuals without virus exposure ( , , , ) . mateus et al. suggested that this could be due to cross reactivity between sars-cov- and other common human coronaviruses, such as oc , hku , nl and e ( ) . here we report an evolutionary metric, which can accurately separate cross-reactive epitopes from those that are not, and use this metric to suggest potential cross-reactive epitopes in sars-cov- . prioritizing these crossreactive epitopes in vaccine development can potentially lead to broadly neutralizing immunity across the betacoronavirus family. here, we use the evolutionary trace (et) method, which predicts the importance of protein sequence positions, from most important ( . ) to least important ( . ). this relative ranking re ects the variation entropy of each sequence position within and across the branches of an associated phylogenetic tree, revealing evolutionary pressure points that correspond to functional and structural determinants, and the protein sites at which they often cluster ( ) . past studies have shown that this method can predict binding and catalytic functional sites ( , ) , guide protein engineering ( , ) and predict function ( ) . et rankings of residue importance can also be combined with amino acid substitution log odds to estimate the likely impact, or evolutionary action (ea), of coding variations on protein function ( ) ( ) ( ) . here, this rst et and ea analysis of a full viral proteome identi es evolutionary important residues and functional sites in the sars-cov- proteome. evolutionary trace of sars-cov- . in order to map functional determinants in sars-cov- proteins we applied the et approach. with the multiple sequence alignments ( figure s a , dataset s ) and the corresponding phylogenetic trees ( figure s -s ) in hand for of the sars-cov- proteins (see si methods and materials), our protocol calculated the et ranking of importance for . % of sars-cov- amino acid residue positions (dataset s ) generated from each of three protein databases (uniref , uniref , ncbi nr) and combined into a single average. to independently assess the quality of these ranks, rather than rely on the variety and breadth of sequences in the alignments as indicative of information content, we used a statistical measure that quanti es the distribution of et rankings in the d structure. residues with smaller et rankings tend to cluster together in active sites, protein-protein interaction sites or other functional sites ( , , ( ) ( ) ( ) . such a clustering of top-ranked residues was particularly prominent in several sars-cov- proteins and complexes including the nsp main protease, the nsp /nsp /nsp rna-dependent rna polymerase complex and the nsp /nsp rna cap methyltransferase complex and can be visualized as groups of warm colored residues in the protein structure ( figure ). we evaluated the quality of et rankings using the selection cluster weighting (scw) z-score which measures how well highly ranked residues cluster relative to a randomized distribution of scores on the structure (see si materials and methods). for almost all proteins the scw z-score is standard deviations above the randomized background, suggesting that the alignments are informative and that the resulting et rankings are meaningful ( figure s , dataset s ). for the proteins that do not reach signi cant z-scores there is a clear correlation to a lack of sequences in the alignments (e.g. nsp , e, orf , and orf a), or, the structure belongs to a small domain within a larger protein (e.g. the macrodomain within nsp and the hr domain within the s protein). to probe these smaller domains within large proteins we further investigated the adp-ribose-phosphatase (adprp) subdomain and macro and papain-like protease (pl pro ) domains of nsp . nsp was an intriguing case because top-ranked et residues cluster well in its pl pro domain but not in its macrodomain or in the adprp subdomain (dataset s ). in order to better resolve et rankings for nsp , we generated new alignments, phylogenetic trees, and et residue rankings for the subsequences speci c to each nsp domain structure (see si materials and methods). in this focused analysis, the pl pro domain now yielded ~ % more sequences leading to a corresponding increase in the clustering of topranked residues ( figure s ). for the macrodomain and adprp subdomain, thousands of additional sequences spanning the three domains of life and distantly related viruses were included in the new data set which resulted in et rankings that rivaled the signi cance of clustering in the pl pro domain. the stark differences we nd in the phylogenetic trees of speci c nsp domains con rm previous observations of alternate domain con gurations in different coronavirus genera and even within clades of betacoronavirus ( ) . the improvement in scw z-score corresponds to a cluster of highly ranked et residues within the ligand binding site of the macro domain and adprp subdomain ( figure s d and e) which was missing in the analysis of the full nsp reference sequence. having better resolved et rankings in the nsp domains, we returned to the main data set to see how well et rankings captured functional sites in other proteins. phylogenetically conserved ligand binding sites. a catalog of sars-cov- ligand binding sites could serve as a timely resource for prioritizing therapeutic targets. previous studies have shown that evolutionary sequence information correlates well-enough with enzyme active sites so as to serve as dtemplates for functional signatures ( ) and identify allosteric sites ( , ) . here we used nsp , nsp and nsp as examples to show how the evolutionary sequence information captured by et can successfully predict ligand binding sites for virus proteins. nsp is an rna dependent polymerase, nsp mediates the cleavage of both single-and double-stranded rna at uridine sites ( ) and nsp is a m gpppa-speci c, s-adenosylmethionine (sam)-dependent, '-o-mtase ( ) . as shown in figure a -c, top ranked et residues cluster around the native ligands of nsp (rna) ( ), nsp (gpu) ( ) and nsp (m gpppa and sam) ( ) , indicating an accurate prediction of ligand binding sites for these proteins. several new functional sites are also predicted by et ( figure d and e). on the spike protein (s), one such et cluster partially overlaps the s ' protease cleavage site that is critical for membrane fusion and infectivity of the sars virus ( ) . on the nucleoprotein (n), a cluster of highly ranked et residues lies adjacent to the putative rna binding site ( ) and may contribute to formation of n protein-rna helical laments that are essential to packaging the rna genome. these results indicate et can provide alternative drug target sites with no currently available ligand-bound structures. in addition to being important to protein function, ideal drug target sites should also be rarely mutated in the current outbreak so as to avoid the potential emergence of drug resistance. thus, we focused on positions that do not have any mutations observed in the , high quality, full length sars-cov- sequences that were available as of september th, . as more genomes and mutations within them are sequenced it may be necessary to lower the variant count stringency. in order to translate proteomewide et ranks and mutational pro les into potential drug target sites, we focused on clusters of mutationfree, surface-exposed residues that are highly ranked by et and fall within Å of each other ( figure , dataset s ). the resulting catalog of putative drug targets includes sites at ~ sites per structure with the largest structure (full-length model of spike, vsb_ _ _ ) having the highest number of sites. for nsp , nsp and nsp , the predicted drug targets overlap the known ligand binding sites. in order to evaluate whether these et drug sites may correspond to druggable target sites, we examined their overlap with sites observed in ve sars-cov- protein-drug complex crystal structures. it is important to note that all drugs showed an inhibitory effect in either cellular or biochemical assays. remdesivir has been shown to speed up the recovery of covid- patients in clinical trials ( ) , while the α-ketoamide inhibitor b can suppress sars-cov- replication in cell lines ( ) . vir and tipiracil were also shown to effectively inhibit the enzymatic activities of their targets ( , ) . the remaining drug, sinefungin, is a pan-mtnase (nsp ) inhibitor that inhibits the growth of yeast cells ectopically expressing nsp from sars-cov ( ) . the et drug sites were mapped onto the ve sars-cov- protein-drug complexes ( , , ( ) ( ) ( ) and, as shown in figure , all ve drugs reside in protein surface pockets that are within or very close to our predicted et drug sites. the et drug site for nsp is the least well recovered due to a single sars-cov- sequencing entry (strain mt ) wherein several residues in the protease active site are mutated (g s, s e and c i), including the catalytic cystine residue. s e and c i are both caused by two nucleotide substitutions in the codon, and only observed in this strain (sampled on / / ). it is unclear whether this is a sequencing artifact or represents a genuine active site plasticity that compromises nsp 's active site as a stable drug target. it does however illustrate the importance of accurately detecting emerging sequence variations when choosing drug targets. overall, these results show that predicted et drug sites can recover experimentally tested drug binding pockets and suggest new sites that can be targeted in computational docking approaches. in addition, because these sites are conserved across multiple coronavirus genera, these predicted et drug sites are anticipated to be relevant for identifying inhibitors of sars-cov- as well as more distantly related coronaviruses. conserved linear epitopes. et drugs sites may prove valuable in guiding drug design, but these approaches are dependent upon having high resolution crystal structures and some structures are either not yet available (e.g. nsp , nsp , m, and several accessory proteins), do not cover a majority of the protein (nsp and nsp ) or are too low in resolution for accurate docking studies (nsp , nsp , ectodomain of s, n, orf a and orf a). however, et operates over linear protein sequences and thereby can identify phylogenetically important sequence fragments even in the absence of a d structure ( ) . as in our approach to discover et drug sites, we combined et residue ranking information with sequencing data from sars-cov- isolates to arrive at linear peptides along the proteome that are evolutionarily important and also show little variation in the current outbreak ( figure s , dataset s ). in order to assess the value of these epitopes, we asked whether they could recapitulate et-derived drug sites. et-de ned linear peptides for nsp were mapped onto an available nsp structure and, as illustrated in figure a , the majority of the structural and linear peptides overlap with each other. linear et peptides and et drug sites overlap well for other sars-cov- proteins, which was quanti ed by jaccard similarity and fisher's exact test (dataset s ). these data suggest that linear et peptides contain functionally relevant information since they recapitulate et drug sites for proteins or domains without requiring d structural data. in the absence of a protein structure, these et peptides could be useful in designing inhibitory peptides ( , ) . these peptides are also connected to a second main approach towards resolving the pandemic, by way of vaccine development. although vaccines for covid- may become available soon, ideally, effective protection against future outbreaks from related coronaviruses would require a broadly neutralizing effect wherein the immune system recognizes epitopes shared among coronavirus species. the prospect of raising a broadly neutralizing response is bolstered by a recent study wherein naïve patients, never exposed sars-cov- , were found to possess a subset of t-cells that can cross-react to homologous epitopes shared by common cold coronaviruses and sars-cov- ( ) . in this context, we note that et rankings re ect the degree of homology over the phylogenetic tree, so we reasoned that summing et scores over the length of an identi ed t-cell epitope may be able to estimate its potential for crossreactivity. as a rst step, we summed the et ranks for each of the sars-cov- epitopes that had been shown to react with patient-derived t-cells so that they could be ranked by predicted cross-reactivity to common cold coronavirus epitopes assayed by mateus et al. although summing et ranks could identify sars-cov- epitopes that are more likely to be cross-reactive ( figure s ), it did not account for the speci c amino acid differences in the potentially cross-reactive homolog. in other words, et ranks can predict whether or not a sars-cov- epitope will be cross-reactive in general, but they do not specify which epitope homologs will cross react. in order to improve resolution of our predictions to speci c epitope homologs, we next combined ea, a predictor of mutational impact, with the summed et rankings. ea calculates the predicted impact of amino acid variations on protein function aiding in the interpretation of coding variants ( ) ( ) ( ) . summing the predicted impact of amino acid changes between a sars-cov- epitope and a homologous epitope in another virus (sumea) while adjusting for the sars-cov- epitope's overall evolutionary importance (sum( -et ranking)) produced a metric that was able to separate cross-reactive epitopes from those that did not cross react ( figure b and s , dataset s ). this metric, sumea/sum( -et ranking), was then applied to untested sars-cov- t-cell epitopes and their common cold homologs ( ) . from a total of homologs we identi ed with potential to cross react to one of ve sars-cov- epitopes ( figure c , dataset s ). these sars-cov- epitopes along with the others experimentally shown to possess cross-reactivity could be used in a multi-epitope vaccination strategy that provides a broad neutralizing response to currently circulating coronaviruses, sars-cov- and, possibly, future outbreaks. moreover, the approach is not speci cally linked to any speci c virus, so it could be replicated in other families of pathogens. dissemination. to disseminate these results, a public website (http://cov.lichtargelab.org) makes these data and analyses fully accessible. the data include, for example, multiple sequence alignments, precalculated et ranks, and predicted epitopes (both linear and structural) for all sars-cov- proteins. in addition, an interactive structure viewer enables users to explore any one of the et-colored structures ( figure ) and predicted et drug sites associated with those structures (dataset s - ). the website will be updated as new sars-cov- isolates and protein structures become available. rapid progress has been made in response to the acute sars-cov- threat; from sequencing, to structural determination, and to drug and vaccine development ( , ( ) ( ) ( ) ( ) . here, by combining information from evolutionary history and the current outbreak of sars-cov- we systematically mapped potential therapeutic sites on all sars-cov- proteins. we make use of phylogenetics, sequence information and structure information to provide a functional map of sars-cov- proteins. the sites we determined are not only stable across coronavirus families but are also stable to mutations in the current pandemic, which make them ideal targets for pan coronavirus/betacoronavirus therapeutics. in so doing, we pinpoint functionally and structurally important sites in the sars-cov- proteome that reduce the search space for drug and vaccine development. in addition to focusing therapeutic studies, the data presented here will be important in identifying the mechanism of action for successful therapies, not only in the context of the current outbreak but across future coronavirus outbreaks. our ndings are available on the accompanying website, where results will be updated as more sars-cov- isolates are sequenced, and structures are completed. this should not only expand coverage of the sars-cov- proteome and re ne predicted therapeutic sites, but also provide a resource to monitor for variants that may signi cantly impact the virulence of sars-cov- . there are limitations to this study. the quality of our results depends on the number and range of homologous sequences available. although most of the non-structural proteins yield et rankings that are likely informative (clustering z-score >= or > unique sequences between - % identity), nsp and the accessory proteins do not reach signi cant z-scores or have many diverse sequences in their nal alignments. the inability to recover more sequence information could be due to a higher evolutionary rate in these proteins that limits our ability to recognize distantly related homologs with very little sequence identity. more likely, these peripheral genes have been more recently recruited through the frequent recombination events that occur in the coronavirus family ( ) . such recruitment has occurred at the domain level in the nsp protein with its variable number of domains ( to ), some of which are unique to the betacoronavirus clade b containing sars-cov- and - . therefore, it is unsurprising that the initial sequences returned and corresponding et rankings for full-length nsp are heavily in uenced by the less divergent pl pro domain that is present across coronavirus clades and families. domain-speci c analysis of nsp greatly improved both the number of sequences returned, phylogenetic coverage, and the resolution of et results. this suggests that future work should include domain speci c analyses for multidomain proteins. such domain speci c analyses are likely to provide et rankings that identify important functional sites for individual domains while full-length analysis can provide insight into how particular domains became recruited for speci c branches of the phylogenetic tree. several other groups have focused on experimentally screening clinical-stage or fda-approved small molecules with the hope of identifying and repurposing drugs for sars-cov- treatment. tens to hundreds of drug candidates are identi ed by these high-throughput assays. however, drug e cacy of top hits might be cell line speci c ( ) and the mechanisms of drug action may be unclear or acting through modulation of the host cell rather than targeting the virus itself. in silico docking studies ( , ) take a more targeted approach towards speci c sars-cov- sites that may complement the results of experimental screens. knowledge of the ligand binding site improves the chance of identifying drugs that inhibit protein function and although structural characterization of sars-cov- proteins is unprecedented, the structural information available is far from comprehensive. using the structures which have been solved, we identi ed clusters of surface residues that have low et rankings and a lack of mutations in the current outbreak as potential drug target sites. many of these et drug sites correspond to ligand bound active sites but others map to evolutionarily important sites that have yet to be fully characterized. et operates over the phylogenetic history of linear sequence space and can anticipate functional sites that may or may not be characterized in the future. these putative et drug targets can guide docking studies to additional sites not immediately apparent from currently available structural information. sites highlighted by et are evolutionarily conserved in the phylogenetic tree used in et calculation and this information can set expectations for how broadly a drug may inhibit different viral species. for instance, remdesivir targets the active site of rna-dependent rna polymerase (nsp ) in sars-cov- as well as homologs in sars, mers and the distantly related ebola rna virus ( , ) . the nsp active site has a very strong et signal that is derived from one of deepest phylogenetic trees in our analysis and thereby would be expected to inhibit a wide swath of coronaviruses and related rna viruses. in contrast, the adp ribose phosphatase sub-domain of nsp has a phylogenetic tree that includes relatively few coronavirus sequences among a multitude of sequences that span three domains of life. drugs targeting this domain may inhibit coronavirus infectivity but could also have side effects if they inhibit host adp ribose phosphatases. however, adp ribose phosphatase inhibitors have been developed for cancer treatment and a wealth of information and expertise is available for this group of drugs ( ) . as with the application of any new drug, particular care should be taken to ensure unwanted side effects do not overshadow any bene ts as a viral inhibitor. the linear epitopes we de ned here may also provide valuable information in drug development both for proteins with structure, and for those without, as amino acids connected linearly are guaranteed to be connected structurally. for protein regions that are exible or undergo large conformational changes during activation, structural proximity de ned in one conformation may not hold in other conformations. for example, the spike protein undergoes a large conformational change when mediating host-virus membrane fusion ( ) . a structural epitope that is determined in the closed state might not be appropriate for the opened state. thus, linearly connected regions may identify cryptic binding sites that are revealed upon conformational change of the protein. linear epitopes are also a predominant mode of recognition of the adaptive immune system. studies have shown that some sars-cov- t-cell epitopes are capable of cross reacting with homologous peptides in other human coronaviruses ( , ) . we performed evolutionary analysis on these crossreactive epitopes and developed a new metric that can distinguish cross reactive epitopes with a high accuracy that outperforms a simple percent identity metric. this sumea/sum( -et ranking) metric was then used to suggest other potential sars-cov- cross-reactive t-cell epitopes. in general, cross-reactive epitopes have the potential of generating a pan-betacoronavirus immune response that can stimulate bcells to produce broadly neutralizing antibodies. although not directly addressed in this work, the sumea/sum( -et ranking) metric may also be able to identify epitopes that stimulate cytotoxic t-cells through presentation on mhc- molecules. several groups are at the preclinical stage in multi-epitope vaccine development (milkeninstitute.org) but the speci c epitopes are not publicly available, and it is unknown whether or not they include any that are cross reactive. the ability to identify cross-reactive epitopes could inform a multi-epitope vaccine strategy that is speci cally designed to inoculate a susceptible population to a wide range of extant and undiscovered betacoronaviruses. this study was motivated by the current pandemic and uses evolutionary sequence information to guide the development of therapeutics for covid- . although we are presently in the grip of covid- , this pandemic was preceded by the sars and mers outbreaks and it should be anticipated that related coronaviruses will cause future outbreaks. and while this study is also focused upon sars-cov- , it draws upon pieces of sequence information taken from the whole of the coronavirus family and thereby the ndings are extendable to other coronavirus species, including those that have not yet been encountered. indeed, the tools we present could be applied to any family of pathogen. putting a pandemic virus into the evolutionary context of related viruses can expose a path to managing a recovery and may offer therapeutics that cover future outbreaks. a brief description of the methods can be found here, for a more in-depth description of speci c methods please see the supplementary text. in order to map functional determinants in sars-cov- proteins we applied the evolutionary trace (et) approach ( , ) . this method ranks each amino acid position from most to least important during evolution by tracking how they vary along the coronavirus phylogenetic tree. these rankings vary based on the precise choice of multiple sequence alignment (msa). in order to produce robust et rankings three separate alignments were generated for each protein in the sars-cov- wuhan-hu- reference genome (nc_ . ) ( ), by querying three protein databases (uniref , uniref , and ncbi nr) for sequences with identity between % and %, thus ltering out those that were either overly distant or redundant. only two proteins had too few matches for et, nsp and orf , both of which have unknown function and have very short reference sequences ( and amino acids, respectively, figures , dataset s ). the et scores for all other proteins for each alignment and for the average scores across alignments were evaluated with the previously presented selection cluster weighting (scw) zscore ( , ( ) ( ) ( ) . the z-scores for each structure were then ranked - in order to determine if et scores from one database or the average of the three consistently outperforms the others. et scores from each of the three databases performed similarly well but the average et of the three provided better z-scores in most cases ( figure s c ). et rankings were further investigated by comparing the highest scoring regions with known functional sites. therapeutic sites were predicted based on both the linear sequence as well as structural constraints. residues were nominated as members of potential therapeutic sites based on their et rankings, lack of variants as found in sars-cov- sequences retrieved from gisaid ( ) and the china national center for bioinformation ( )(cncb), as well as surface accessibility, and structural proximity. structurally identi ed therapeutic sites were compared to drug binding sites for agents known to bind to sars-cov- proteins. to generalize this approach to proteins without structure, linear sites were predicted based on et rankings, current mutational pro le and linear connectivity. structural and linear predicted sites were compared to one another using jaccard similarity and fisher's exact test, to determine the usefulness of this method in the absence of a protein structure. several et metrics were also interrogated to determine their ability to highlight potential cross-reactive immunogenic epitopes ( ) . the best metric, sumea/sum( -et ranking), was used to predict cross-reactive t-cell epitopes which are good potential therapeutic sites. an interactive web-based dashboard to track covid- in real time the effect of human mobility and control measures on the covid- epidemic in china. science ( -. ) changes in contact patterns shape the dynamics of the covid- outbreak in china genomic characterization of the novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting wuhan genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding nsp of coronaviruses: structures and functions of a large multidomain protein activity pro ling and structures of inhibitor-bound sars-cov- -plpro protease provides a framework for anti-covid- drug design. biorxiv prepr tipiracil binds to uridine site and inhibits nsp endoribonuclease nendou from sars-cov- . biorxiv covid- ): current status and future perspectives chadox ncov- vaccine prevents sars-cov- pneumonia in rhesus macaques two linear epitopes on the sars-cov- spike protein that elicit neutralising antibodies in covid- patients covid- vaccine development pipeline gears up covid- , an emerging coronavirus infection: advances and prospects in designing and developing vaccines, immunotherapeutics, and therapeutics recent discovery and development of inhibitors targeting coronaviruses novel coronavirus: genome structure, clinical trials, and outstanding questions discovery of sars-cov- antiviral drugs through large-scale compound repurposing chloroquine does not inhibit infection of human lung cells with sars-cov- the autodock suite at identi cation of potential natural inhibitors of sars-cov main protease by molecular docking and simulation studies class a g protein-coupled receptor antagonist famotidine as a therapeutic alternative against sars-cov : an in silico analysis docking techniques in pharmacology: how much promising? targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals a sequence homology and bioinformatic approach can predict candidate targets for immune responses to sars-cov- preliminary identi cation of potential vaccine targets for the covid- coronavirus (sars-cov- ) based on sars-cov immunological studies development of epitope-based peptide vaccine against novel coronavirus (sars-cov- ): immunoinformatics approach sars-cov- -speci c t cell immunity in cases of covid- and sars, and uninfected controls a large-scale database of t-cell receptor beta (tcrβ) sequences and binding associations from natural and synthetic exposure to sars-cov- . mark klinger adaptive biotechnologies jennifer n selective and cross-reactive sars-cov- t cell epitopes in unexposed humans single-cell transcriptomic analysis of sars-cov- reactive cd + t cells a family of evolution-entropy hybrid methods for ranking protein residues by importance an evolutionary trace method de nes binding surfaces common to protein families receptor and βγ binding sites in the α subunit of the retinal g protein transducin. science ( -. ) β-arrestin-dependent, g protein-independent erk / activation by the β adrenergic receptor elucidation of g-protein and β-arrestin functional selectivity at the dopamine d receptor prediction and experimental validation of enzyme substrate speci city in protein structures a formal perturbation equation between genotype and phenotype determines the evolutionary action of protein-coding variations on tness objective assessment of the evolutionary action equation for the tness effect of missense mutations across cagi-blinded contests cagi : objective performance assessments of predictions based on the evolutionary action equation background frequencies for residue variability estimates: blosum revisited sequence and structure continuity of evolutionary importance improves protein functional site discovery and annotation accounting for epistatic interactions improves the functional analysis of protein structures evolution-guided discovery and recoding of allosteric pathway speci city determinants in psychoactive bioamine receptors recent advances suggest increased in uence of selective pressure in allostery nidovirus ribonucleases: structures and functions in viral replication crystal structure and functional analysis of the sars-coronavirus rna cap ′-o-methyltransferase nsp /nsp complex sars-cov- rdrp/rna complex the crystal structure of nsp -nsp heterodimer from sars-cov- in complex with s-adenosylmethionine. biorxiv characterization of a highly conserved domain within the severe acute respiratory syndrome coronavirus spike protein s domain with characteristics of a viral fusion peptide structural basis of rna recognition by the sars-cov- nucleocapsid phosphoprotein remdesivir for the treatment of covid- -preliminary report crystal structure of sars-cov- main protease provides a basis for design of improved α-ketoamide inhibitors structural basis for inhibition of the rna-dependent rna polymerase from sars-cov- by remdesivir rcsb pdb - wkq: . angstrom resolution crystal structure of nsp -nsp heterodimer from sars-cov- in complex with sinefungin evolutionary traces of functional surfaces along g protein signaling pathway identi cation of a candidate therapeutic autophagy-inducing peptide evolutionary trace-based peptides identify a novel asymmetric interaction that mediates oligomerization in nuclear receptors a new coronavirus associated with human respiratory disease in china therapeutic strategies against covid- and structural characterization of sars-cov- : a pharmacologic treatments for coronavirus disease (covid- ): a review structural plasticity of sars-cov- cl mpro active site cavity revealed by room temperature x-ray crystallography genetic recombination, and pathogenesis of coronaviruses in silico molecular docking analysis for repurposing therapeutics against multiple proteins from sars-cov- remdesivir: a review of its discovery and development leading to emergency use authorization for treatment of covid- prophylactic and therapeutic remdesivir (gs- ) treatment in the rhesus macaque model of mers-cov infection targeting deparylation for cancer therapy ready, set, fuse! the coronavirus spike protein and acquisition of fusion competence gisaid: global initiative on sharing all in uenza data -from vision to reality the novel coronavirus resource structural basis for translational shutdown and immune evasion by the nsp protein of sars-cov- . science ( -. ) the sars-cov- conserved macrodomain is a highly e cient adpribosylhydrolase. biorxiv structure of papain-like protease from sars-cov- and its complexes with noncovalent inhibitors the swiss-model repository-new features and functionality swiss-model: homology modelling of protein structures and complexes qmeandisco-distance constraints applied on model quality estimation non-structural protein (nsp ) | p dtd pro_ | models rcsb pdb - yb : sars-cov- main protease with unliganded active site ( -ncov crystal structure of the sars-cov- non-structural protein structural basis for helicase-polymerase coupling in the sars-cov- replication-transcription complex rcsb pdb - zsl: crystal structure of the sars-cov- helicase at . angstrom resolution structural basis and functional analysis of the sars coronavirus nsp -nsp complex crystal structure of nsp endoribonuclease from sars cov- in the complex with uridine- '-monophosphate rcsb pdb - w h: . angstrom resolution crystal structure of nsp -nsp complex from sars-cov developing a fully glycosylated full-length sars-cov- spike protein model in a viral membrane cryo-em structure of the sars-cov- a ion channel in lipid nanodiscs structural model of the sars coronavirus e channel in lmpg micelles center for structural genomics of infectious diseases (csgid), rcsb pdb - w : structure of the sars-cov- orf a encoded accessory protein crystal structure of rna binding domain of nucleocapsid phosphoprotein from sars coronavirus rcsb pdb - zco: crystal structure of c-terminal dimerization domain of nucleocapsid phosphoprotein from sars-cov- , crystal form ii the authors of this text have no con icts of interest to report. key: cord- -vy qgtll authors: nan title: proteases date: - - journal: febs j doi: . /j. - . . _ .x sha: doc_id: cord_uid: vy qgtll nan the incretin hormones glp- and gip are released from the gut during meals, and serve as enhancers of glucose stimulated insu-lin release from the beta cells. furthermore, glp- also stimulates beta cell growth and insulin biosynthesis, inhibits glucagon secretion, reduces free fatty acids and delays gastric emptying. glp- has therefore been suggested as a potentially new treatment for type diabetes. however, glp- is very rapidly degraded in the bloodstream by the enzyme dipeptidyl peptidase iv (dpp-iv; ec . . . ). a very promising approach to harvest the beneficial effect of glp- in the treatment of diabetes is to inhibit the dpp-iv enzyme, thereby enhancing the levels of endogenously intact circulating glp- . the three dimensional structure of human dpp-iv in complex with various inhibitors creates a better understanding of the specificity and selectivity of this enzyme and allows for further exploration and design of new therapeutic inhibitors. the majority of the currently known dpp-iv inhibitors consist of an alpha amino acid pyrrolidine core, to which substituents have been added to optimize affinity, potency, enzyme selectivity, oral bioavailability, and duration of action. various compound series and their sar relative to alpha amino acids will be presented. memapsin (b-secretase, bace ) is the membrane-anchored aspartic protease that initiates the cleavage of b-amyloid precursor protein (app) leading to the production of amyloid-b (ab), a major factor in the pathogenesis of alzheimer's disease (ad). since memapsin is a major target for the development of inhibitor drugs for the treatment of ad, its structure and physiological functions are topics of intense research interest currently. here we discuss the structural features of memapsin and how do they contribute to the activity and inhibition of the protease. structural and kinetic evidence support the presence of subsites for substrate or inhibitor binding in the activesite cleft of memapsin . subsites p to p ' are most useful in the design of transition-state analogue inhibitors. recent data indicated that subsites p , p and p have strong influence of hydrolytic rate or inhibition potency. these subsites are, however, too far from the transition-state isostere for the design of drug-like transition-state inhibitors but can be utilized for the design of non-transition-state inhibitors that compete for substrate binding. besides carrying out proteolytic activity, the ectodomain of memapsin also interacts with app leading to the endocytosis of both proteins into the endosomes where app is hydrolyzed by memapsin to produce ab. a phosphorylated motif in the cytosolic domain of memapsin is responsible for the recognition of gga proteins as part of the recycling mechanism that transports memapsin from endosomes to trans-golgi then back to cell surface. these interactions may also be considered for the design of small-molecular compounds that interfere with memapsin trafficking and thus reduce the production of ab. identification of human carnosinase -a brainspecific metalloprotease m. teufel biochemistry, exploratory research, sanofi aventis, strasbourg, france. e-mail: michael.teufel@sanofi-synthelabo.com metalloproteases form a large and diverse family of proteases and are molecular targets that represent an opportunity for therapeutic intervention. in particular, the development of potent inhibitors has made progress for the family of matrix metalloproteases (mmp). the sequencing of the human genome revealed that a significant percentage of the drugable genome is represented by proteases, many of them still with unknown function. in this presentation, data will be presented on the deorphanization of two previously unknown genes by means of bioinformatics and classical biochemistry. this work led to the identification of human carnosinase, a dipeptidase specifically expressed in the human brain and an ubiquitously expressed close homologue, characterized to be a non-specific dipeptidase. stimulating serpins with synthetic tailor-made oligosaccharides: a new generation of antithrombotics m. petitou thrombosis & angiogenesis, sanofi-aventis, toulouse, france. e-mail: maurice.petitou@sanofi-aventis.com we will discuss our research on synthetic oligosaccharides able to selectively activate the inhibitory activity of antithrombin towards various serine proteinases. we first synthesized pentasaccharides closely related to the antithrombin binding domain of heparin [ ] (the active site), as well as analogues displaying different pharmacokinetic profiles. selective inhibitors of coagulation factor xa were thus obtained that represent a new class of antithrombotic [ ] drugs currently being evaluated worldwide. we then designed larger oligosaccharides [ ] that inhibit both factor xa and thrombin in the presence of antithrombin. they are devoid of undesired nonspecific interactions with blood proteins, particularly with platelet factor . clinical trials are ongoing to prove the therapeutic benefits of this new type of coagulation inhibitors. slow tight binding inhibitors in drug discovery: in the case of dppiv and elastase inhibitors z. kapui, e. boronkay, i. bata, m. varga, e. mikus, k. urban-szabo, s. ba´tori and p. ara´nyi discovery research, chinoin member of sanofi-aventis group, budapest, hungary. e-mail: zoltan.kapui@sanofi-aventis.com enzyme are extremely potent causing significant inhibition at very low concentrations that may be comparable to the concentration of the target enzyme. when this inhibition is studied in vitro, complexities arise because the concentration of the inhibitor is so low that it is altered significantly as a result of combination with the enzyme. this situation is referred to as tight-binding inhibition. partly as a result of their low concentrations, tight-binding inhibitors often show slow-binding characteristics. unlike conventional inhibitors that act almost instantaneously (or at least within the ms time scale), slow-binding inhibitors may take several seconds, minutes or even hours for their effect to be fully exhibited. this association between slow-binding and tight-binding is relatively common and slow tight-binding inhibitors are extremely potent and specific. proteolytic enzymes are involved in a multitude of important physiological processes. their intrinsic properties and activities are in the focus of wide-ranging research and they have a valuable role in experimental and therapeutic purposes. serine proteases are attractive targets for the design of enzyme inhibitors since they are involved in the etiology of several diseases. within the class of serine proteases, human leukocyte elastase (hle) is one of the most destructive enzymes in the body. the enzyme dipeptidyl peptidase iv (dppiv) is a serine exopeptidase that cleaves xaa-pro dipeptides from the n-terminus of oligo-and polypeptides. inhibitors of dpp iv are of increasing interest to pharmaceutical industry alike, as they may become established as the next member of the oral antidiabetic class of therapeutic agents. objective of our work was to develop reversible, slow, tight-binding inhibitors against these serine proteases. ssr is a potent inhibitor of hle, the inhibition constant (k i ) and the constant for inactivation process (k on ) being . ± . nm. this inhibitor is reversible, slow, tight-binding inhibitor with k on = . ± . /ms, and k off = . ± . ) /s. ssr inhibits the solubilization of elastin by hle with nm of ic value. this inhibitor is one of the most effective inhibitor of a serine proteinase yet described. ssr is a potent, competitive and slow tight binding type inhibitor of the human dipeptidyl peptidase-iv enzyme (k i = nm, t½ = h). on the basis of kinetic properties, ssr forms stable enzyme-inhibitor complex. these slow tight-binding inhibitors have unique inhibitory properties, they are extremely active, and selective, form stable enzyme-inhibitor complex, therefore they have long-lasting effect. their oral activity and long lasting in vivo biological potency agreed very well with stable enzyme-inhibitor complex. the advantages in drug discovery of slow tight-binding inhibitors are discussed in this presentation. enzyme inhibition trend analysis -a new method for drug design m. shokhen, n. khazanov and a. albeck the julius spokojny bioorganic chemistry laboratory, chemistry, bar lan, ramat gan, israel. e-mail: albecka@mail.biu.ac.il many of the drugs that are currently in use or at different stages of development are enzyme inhibitors. therefore, enzyme mechanism-based inhibitors could be developed into highly selective drugs. our novel enzyme inhibition trend analysis method com-bines experimental enzyme kinetics data and high level quantum mechanical modeling of enzyme-inhibitor chemical interactions. the method utilizes the principal catalytic reaction scheme of the target enzyme and does not require its d structure (a ligand based approach). the method is valid for the prediction of the trend in binding affinity of inhibitors not only for the specific enzyme for which the qsar model was optimized, but also for the whole enzyme family. the methodology would contribute significantly to overcoming the problem of fast mutational resistance developed by pathogens in response to pharmaceutical treatment. it can be used as a computational tool for expert analysis of various hypotheses about structure-activity relationships formulated for the design of new inhibitors. angiotensin-converting enzyme (ace, ec . . . ) is a key enzyme for blood pressure control and water-electrolyte homeostasis. a large number of highly potent and specific ace inhibitors are used as oral drugs in the treatment of hypertension and congestive heart failure. somatic ace consists of two homologous domains (n-and c-) within single polypeptide chain, each one containing a catalytic site. the two catalytic sites within somatic ace molecule were long considered to function independently. however, recent investigations indicate the existence of negative cooperativity between ace active sites. we studied the properties of bovine ace active centers by use of separate ace n-domain (n-ace) obtained by limited proteolysis of parent somatic enzyme and testicular ace, which represents c-domain. these results were compared with the data obtained for full-length somatic ace from bovine lungs. the results obtained demonstrate strongly dependent mechanism of action of ace active centers in the reaction of the hydrolysis of tripeptide substrates. however, the hydrolysis of decapeptide angiotensin i proceeds independently on n-and c-domains. the mechanism of inhibition of ace activity is also dependent on the length of the inhibitor: (i) random binding of the ''short'' inhibitor molecule (such as captopril, lisinopril) to one of the active sites dramatically decreases binding of another inhibitor molecule to the second site; (ii) ''long'' nonapeptide teprotid binds to both active sites without any difficulties. since the main physiological ace substrates in the organism are ''long'' peptides angiotensin i and bradykinin, the development of new class of inhibitors with prolonged structure would be beneficial for abolishing of ace activity. synthetic peptide studies on severe acute respiratory syndrome coronavirus (sars-cov) extensive proteolytic processing of the replicase polyproteins, pp a ( kda) and pp ab ( kda), by the sars-cov clike protease ( cl pro ). besides, the structural spike protein of sars-cov contains two heptad repeat regions (hr and hr ) that form coiled-coil structures, which play an important role in mediating the membrane fusion process. in this study, we focused on both cl pro and the hr regions of sars. previous studies demonstrated that the coronavirus cl pro cleaves the replicase polyproteins at no < conserved cleavage sites, preferentially at the lq sequence. the reported crystal structure of sars-cov cl pro provides insights into the rational design of anti-sars drugs. in order to understand the molecular basis of the enzyme-substrate binding mechanism, we employ the synthetic peptide and mass spectrometry-based approaches to investigate the significance of selected amino acid residues that are flanking both sides of the sars-cov cl pro cleavage site. in addition, previous studies indicated that the relatively deep hydrophobic coiled coil grooves on the surface of sars-cov spike protein heptad repeat regions (hr and hr ) may be a good target site for the design of viral fusion inhibitors. we have designed and synthesized five truncated peptide analogs derived from hr and hr peptides based on both bioinformatics and structural analysis. the biological activities of these truncated analogs will be studied using circular dichroism spectroscopy, multidimensional chromatography, protein cross-linking and mass spectrometry-based approach. the above investigation will definitely broaden our knowledge on the sars research and will reveal the feasibility of rational design of synthetic peptide-based drug in combating with sars disease. ras-transfection-associated invasion: involvement of matrix metalloproteinase(s) confirmed using a chicken embryo model and real time pcr during metastasis tumorogenic cells leave the primary tumour and intravasate into the blood/lymphatic system, exiting at a secondary site to establish a secondary tumour. ras-transfection of a parental, non-invasive mcf- a cell line, established from a patient suffering with benign fibrocystic disease, gave rise to an invasive derivative cell line (mcf- a-neot) exhibiting the phenotype of a pre-malignant, invasive tumour. invasion and metastasis are protease-assisted processes, proteases either being secreted by the tumour, or by the stromal cells under the influence of the tumour. here we demonstrate the involvement of matrix metalloproteinase(s) in the invasion of the ras-transfected mcf- a cell line. tumour cells were inoculated onto the damaged surface of the upper chorioamniotic membrane (cam) of a vasculated -day old chick embryo. the tumour cells were allowed to invade, and the number of invading cells quantified using real time pcr. inhibitors specific for various proteases were applied to the upper cam, to block invasion, and hence identify the proteinases involved. the number of tumour cells invading into the vascular system was established by sampling the lower cam and quantifying the numbers of alu sequences (present only in human cells) in the dna, isolated from the embryonic tissue, using real-time pcr. using this method, the key role of an mmp was demonstrated. spectrum ptk inhibitor, genistein ( lm) abolished the release of neutrophil mmp- , in the presence and absence of extracellular calcium, and reduced the release of timp- . both pp ( lm), a src family ptk inhibitor, and piceatannol ( lg/ml), a syk family ptk inhibitor, reduced mmp- release substantially, indicating that multiple ptk families might be involved in mmp- release. inhibition of either syk or src ptks by piceatannol or pp did not appear to influence timp- release. low levels of wortmannin ( nm, inhibition of pi k) abolished the release of mmp- in the absence of calcium, and reduced mmp- release in the presence of calcium. investigations into the signaling pathways involved in timp- release are continuing. we conclude that mmp- release induced by extracellular calcium may be mediated through pi k and multiple tyrosine kinases, including src and syk family ptks. timp- granule release may also be mediated by tyrosine kinases, although src and syk family ptks do not appear to be involved. thermodynamical and structural analysis of cruzain/cruzipain complexed with e- by molecular modeling and dynamics simulations peptidases represent one of the most relevant enzyme classes targeted by therapeutic intervention. to contribute to the assignment of a physiological role to genomic-derived peptidases and to make them more accessible for the drug discovery process, we have undertaken a program consisting of mrna expression profiling, full-length recombinant expression in insect cells, purification and determination of the catalytic activity for the human proteolytic enzymes. a milestone in the process was the construction of a non-redundant comprehensive database for all human peptidases comprising unique annotated entries, by assembling and filtering public domain information and in-house generated data. in order to get an informative picture on their expression profiling, a transcriptome database for human peptidases was created using the microarray (affymetrix tm ) and taqman Ò (applied biosystems) technologies. in parallel, we have set up the procedure for pcr amplification and cloning of the peptidase genes in mtp format and we have already created a repository of full-length human cdnas encoding for peptidases. besides, the conditions for miniaturized insect cell cultures have been established. experimental trials have defined a validated, reliable and fully-automated robotic procedure for the purification of recombinantly expressed peptidases in mtp format. in a pilot study using the high-throughput approach, % of the chosen reference hydrolases ( ) were secreted into the insect cell medium. of them, % have been proven to be catalytically active using fluorescent homogeneous assays in well format compatible with the high-throughput screening criteria. the application of this procedure to genomic-predicted peptidases is discussed. comparison of putative glutamate racemases from bacillus species glutamate racemase catalyzes the interconversion between l-and d-glutamic acid and is the cell's source of d-glutamate, a key component in the synthesis of both the bacterial cell wall and the glutamyl capsule. bacillus subtilis has two glutamate racemases in its genome, race and yrpc, while b. cerus and b. anthracis have two race genes, race and race . interestingly, race in b. subtilis is the isoform that is essential and has the greater catalytic efficiency, but both race and race have higher sequence homology to race, and % respectively and share less homology with the yrpc isoform, both at %. we have cloned, overexpressed, purified, and are characterizing the kinetic and biophysical properties of the two putative glutamate racemases, race and race from b. cereus and b. anthracis, and will utilize kinetic and biophysical information to design inhibitors that may result in a novel antibiotic. although these two isoforms share a high sequence similarity, their properties are unique. kinetic data indicates a fivefold difference in catalytic efficiency of race compared to that of race in the l-to d-glutamate reaction. also, the absence or presence of substrate has an effect on the oligomerization state, details of which will be reported. finally, our collaborators have demonstrated through genetic knock out experiments that only one of the race isoforms is essential for the growth of b. anthracis. we have crystallized the race isozyme and x-ray data have been collected to . Å . we are currently solving the structure via heavy-atom derivatives. acknowledgment: this research was funded by nih grant u ai . anti-inflammatory effects of methionine aminopeptidase inhibition on human b lymphocytes e. janas , r. priest , s. ratcliffe and r. malhotra rheumatoid arthritis biology, glaxosmithkline, stevenage, uk, high throughput chemistry, glaxosmithkline, stevenage, uk. e-mail: eva.x.janas@gsk.com processing of n-terminal methionine is an essential post-translational modification in both prokaryotes and eukaryotes regulating the subcellular localization, stability and degradation of proteins. the cleavage of the initiator methionine is catalysed by a highly conserved family of metalloproteases, methionine-aminopeptidase and (met-ap ). human met-ap is the molecular target of fumagillin, a natural product with antiangiogenic properties, which covalently binds to his in the catalytic site of met-ap . although fumagillin has been observed to inhibit proliferation and to cause cell cycle arrest in endothelial cells, the mechanism of inhibition is still poorly understood. recent studies describe high expression of met-ap in germinal centre b lymphocytes. here, we investigate the effect of the met-ap inhibitor fumagillin on b lymphocyte proliferation and cell cycle progression and compare these results to those observed in hu-vec. in addition our work sheds light on the mechanistic aspects of met-ap inhibition by fumagillin and its derivatives. effect of distal mutations on the molecular dynamics of the hiv- protease l i and l m are the most common distal mutations found in the protease gene of the drug resistant hiv- strains. these mutations do not confer resistance by themselves, however induce a large synergy effect when added to active site mutations. understanding the impact of the l m and l i mutations on the hiv- protease resistance profile is still a challenge. assuming that their contribution to the resistance profile could be mediated by conformational dynamics we have modeled l i, l m and l i/l m mutants of hiv- protease. these unbound mutated and wild type proteases were subjected to ns molecular dynamics simulations and compared using an essential dynamics (ed) analysis protocol. the first eigenvector of the native protease describes the flap openning motion. following eigenvectors describe ''the catalytic assisting motions'' (cam) of the protease that becomes dominant upon complex formation with a substrate (piana s et al. j mol biol ; ( ): - ). mutation of luecine to methionine residue at position perturbs the protein packing at the dimerization domain. such perturbations affect the dimerization domain motions which correlate with flap opening and the cam. as result the first eigenvector corresponds to the rotational of the one subunit relative to another along axis connecting residues and '. in other words l m mutation mistunes essential motions of the enzyme while retaining its flexibility. this could be the cause of the reduced structural stability of the l m mutant. in contrast, l i mutation causes only redistribution of the correlated motions amplitude. the catalytic assisting motion becomes the most influential that results in stabilization of the closed conformation. in turn, the flap opening motions are reduced in l i mutant. essential dynamics of the double mutant l i/l m could be described in the following terms. a strong propagation of the cam induced by l i mutation is coupled with the altered conformational space caused by l m mutation. as result the double mutant prefers cam motions that are close to the native protease but also account for the perturbed packing within the dimerization domain. results presented may help understanding hiv- protease resistance pathways and in developing more efficient inhibitors of known drug resistant mutants. glutamate carboxypeptidase ii as a cancer marker and therapeutical target: two faces of an enzyme glutamate carboxypeptidase ii is a membrane-bound metallopeptidase expressed in a number of tissues such as jejunum, kidney, prostate and brain. the brain form of gcpii (also known as naaladase) is expressed in astrocytes and cleaves n-acetylaspartyl glutamate, an abundant neurotransmitter, to yield free glutamate. gcpii thus represents an important target for the treatment of neuronal damage caused by excess glutamate. animal model experiments suggest that specific inhibitors of gcpii could be useful for the treatment of several neuropathic conditions, such as brain stroke, chronic neuropathic pain or amyotrophic lateral sclerosis. in the same time, the enzyme is known as prostate-specific membrane antigen since it is upregulated in prostate cancer. it is used for the diagnosis and experimental therapy of prostate cancer using monoclonal antibodies and specific inhibitors. in order to analyze this important pharmaceutical target, we established an expression system based on drosophila schneider's cells. we have also cloned, expressed and characterized its human homolog gcpiii and homologous carboxypeptidases from pig and rat. using specific monoclonal antibodies, we have been able to study the expression of gcpii in various healthy and malignant tissues. we analyzed the substrate specificity of the enzyme using peptide libraries and identified two novel peptide substrates. availability of a recombinant protein enabled to introduce a simple fluorescent activity assay and test specific inhibitors. furthermore, we have biochemically characterized the recombinant protein in terms of pharmacologic properties, oligomeric status, ph dependence and activity modulation by metal ions. we have shown that the glycosylation is indispensable for gcpii carboxypeptidase activity and analyzed the role of each specific n-glycosylation site for the gcpii activity and folding. using site-directed mutagenesis, we are able to identify the domains sufficient and necessary for gcpii activity and also suggest structural explanation for the substrate specificity of the enzyme. dozens of chemicals feature inhibition of proteolytically important tyrosine residue of s proteasome by forming covalent bond to hydroxyl group that abolished its catalytic function. in contrary, the approach we utilize here is based on hydrogen and hydrophobic interactions reversibly inactivating all three sites of s complexes. we performed flexible docking studies of analogues of a natural product tmc- a using jd crystal structure to describe the active site of protein and the position of the ligand. the search yielded several amide-like derivatives that have been screened for superimposition with tmc- a. few of them revealed similar orientation of propylene groups to the active site of s. second screen was performed to reveal the chemicals with the strongest hydrogen-bonding of the ligand to the protein backbone of the receptor. this screen resulted in two chemicals that had strong h-contacts with tyr , ser and, importantly, with proteolytically active tyr residue. to access the validity of the predicted chemicals we undertook in vitro studies measuring the hydrolyses of fluorogenic substrate by the sds activated s proteasome isolated from hela cells. we obtained more than % inhibition of s proteasome activity upon incubation the above chemicals ( . lg/ml) with proteasomes. we then demonstrated the effectiveness of the obtained chemicals to stabilize the level of oncosupressors, including p in benign (mcf a) and highly metastatic (mda ) cell lines. treatment with these compounds greatly restored the level of p in cancer cells. finally, we performed proliferation assay and proved that adding of this artificially synthesized chemicals to mda cell line significantly reduced the level of proliferation, whereas mcf a cells treated at similar conditions have not revealed any abnormal reduction of proliferation below control level. thus, we report of a strategy to predict highly suitable proteasome inhibitors that act via inhibition of protease activity and may lead to creation of a new class of drugs for cancer therapy. localization and trafficking of prostate specific membrane antigen (psma) and its variant form psḾ glutamate carboxypeptidase ii, also known as prostate specific membrane antigen (psma), is a transmembrane glycoprotein highly expressed in maligant prostate tissues. it was shown to represent very useful diagnostic marker and also potential therapeutic target for prostate cancer. two forms of the enzyme were identified in the prostate: full-length transmembrane form consisting of amino acids and a truncated form (called psḾ ), believed to represent spliced variant of psma. the cdnas of both forms are identical except for -nucleotide region near ´end of psma that is absent in psḾ . this deleted region codes for signal peptide as well as for intracellular and transmembrane domains. we are able to detect two protein forms in prostate cancer model cells (lncap cells) and we also show that both forms are glycosylated suggesting that this truncated form might originate from the processing of full length transmembrane psma. number of methods including differential centrifugation, pulse-chase experiments, immunochemistry and gfp-fusion protein analysis were used to analyze the origin, cell localization and trafficking of psma and psḾ in the mammalian cells. we have investigated the substrate specificity of the ns b(h)-ns pro protease by using internally quenched synthetic peptides representing both natural cleavage sequences and their recombinant chimeras. synthetic peptides incorporating the o-aminobenzoic acid/ -nitro-l-tyrosine fluorescence donor-quencher pair were used to analyze the minimum substrate length requirement, residue preferences and the contribution of prime side residues for enzymatic cleavage by the ns protease. a series of peptides derived from the ns /ns a cleavage site was designed for the substrate length mapping study. amino acid truncations in the non-prime and prime side region differently affected rates of substrate hydrolysis and binding as shown by their km and kcat values. the optimal substrate identified was a heptapeptide spanning p -p '. chimeric substrates with all possible combinations of non-prime and prime side sequences derived from polyprotein cleavage sites (c, a/ b, b/ , / a and b/ ) were assayed for reactivity with the ns protease. kinetic parameters revealed a strong impact of the non-prime side residues on km, whereas variations in the prime side region had greater effect on kcat. the fluorogenic derivative of tetrabasic peptide rrrr/gtgn (c/ns ) demonstrated the highest affinity, whereas the peptide kkqr/sagm ( b/c) had the highest turnover number. the one with the greatest catalytic efficiency was identified as rrrr/sltl (c/ a). in addition, we have shown that a ser at p ' is the most preferred residue. the discovery of ns substrates with maximized reactivity will be useful for inhibitor development in sensitive high-throughput assays. inhibiting the mtor pathway with cci- results in decreased production of vascular endothelial growth factor in a head and neck squamous cell cancer cell line c.-a. o. nathan , , n. amirghahari , , x. rong , and y. sun , nathan, otolaryngology, otolaryngology/head and neck surgery, louisiana state university health sciences center, shreveport, la, usa, nathan, cancer center, medicine, feist-weiller cancer center, shreveport, la, usa. e-mail: cnatha@lsuhsc.edu introduction: overexpression of the proto-oncogene eif e in surgical margins of head and neck squamous cell cancer (hnscc) patients is an independent predictor of recurrence and is associated with increase in vascular endothelial growth factor (vegf) expression. activation of eif e in margins through the mtor pathway has led us to determine that cci- an mtor inhibitor has both in vitro and in vivo growth inhibitory effects in hnscc cell lines. we wanted to determine if these effects were associated with decrease in vegf production. material and methods: a hnscc cell line fadu was treated with and ng/ml of cci- (previously established ic = ng/ml). elisa was used to determine vegf protein levels in conditioned medium at ', , , , , and h after treatment with the drug and compared to control cells treated with the diluent for each of the time points. results: a significant decrease in vegf production of % was noted at h and maintained at h in treated cells when compared to control cells at the same time points. the decrease in vegf levels ( - %) was noted within h of treatment with the drug. the percent decrease in vegf protein levels was the same for both doses of cci- . conclusions: overexpression of eif e in hnscc increases translation of mrnas with long 'utrs, one of which is an important angiogenic factor vegf. inhibiting the mtor path-way with cci- can potentially decrease vegf production. this has future clinical implications for arresting tumor progression in hnscc patients with molecular positive margins identified by cells overexpressing eif e, also known as minimal residual disease. proteases from cell culture of jacaratia mexicana m. c. oliver-salvador , g. barrera plant proteases are important in food industry and food technology. the latex of jacaratia mexicana, caricaceae, fruits contains a high level of cysteine proteases. in this work was established a cell suspension culture of j. mexicana. callus culture was initiated from stem explants of j. mexicana on medium consisted of ¼-strength and full-strength ms mineral salts (murashige and skoog, ), full-strength ms organics and g/l agar supplemented with cytokinins: -benzylaminopurine (bap) at . mg/l and -furfurylaminopurine (kinetin) at . mg/l and various concentrations ( . , . and . mg/l) of auxins: , -dichlorophenoxyacetic acid ( , -d) -amino- , , -trichloropiridin- -carboxilic acid (picloram) indoleacetic acid (iaa) a-naphthaleneacetic acid (naa). all of the treatments induced callus except for the iaa, ana and without added phytohormones. the best auxin concentration for callus development was determined to be . mg/l. and the best condition medium for callus development and proteolytic activity of callus was determined to be . mg/l , -d + . mg/l bap. cysteine proteases were produced on callus culture of j. mexicana and liberated in the medium. also in the cell suspension culture these enzymes were secreted. our results support that is possible the synthesis of proteases in vitro culture of j. mexicana. since protease is a primary metabolite, further improvement in enzyme production is possible by increasing the growth rate and yield of cell culture of j. mexicana. and arginine, from peptides and proteins at neutral ph. it is known to play an important role in the control of peptide hormones, growth factor activity at the cell surface, and in the membrane-localized degradation of extracellular proteins. therefore, the present work was carried out to clone and express carboxypeptidase m in pichia pastoris, aiming at developing specific inhibitors and to evaluate the importance of the enzyme in different physiological and pathological processes. for this purpose, the enzyme's cdna was amplified from total placental rna by rt-pcr and cloned in the vector ppic , which uses the methanol oxidase promoter and drives the expression of high levels of heterologous proteins in pichia pastoris. the results show that the cpm gene, after cloning and transfection, integrated in the yeast genome, which started to produce the active glycosylated protein. the recombinant protein was secreted into the medium and the enzymatic activity was measured with the fluorescent substrate dansyl-ala-arg. the enzyme was purified by a two-step protocol including gel filtration and ionexchange chromatography, resulting in a -fold purified active protein in a concentration of mg/l of fermentation medium. sds-page showed that recombinant cpm migrated as a single band with molecular weight similar to native placental enzyme ( kda). these results demonstrate for the first time the establishment of a method using pichia pastoris to express human carboxypeptidase m. mutational analysis of active site of glutamate carboxypeptidase ii human glutamate carboxypeptidase ii (gcp ii) is a membrane metallopeptidase expressed predominantly in the nervous system, prostate and small intestine. in the brain, gcp ii catalyzes cleavage of the abundant neuropeptide n-acetyl-l-aspartyl-l-glutamate (naag) to n-acetylaspartate and glutamate. gcp ii is a type ii transmembrane glycoprotein with a short cytoplasmic nterminal region (amino acids - ), a transmembrane domain (amino acids - ) and a large extracellular domain (amino acids - ) where the active site of the enzyme is situated. gcp ii, as a cocatalytic zinc metallopeptidase, has two zn + ions in the active site which are necessary for its enzymatic activity. recently, the crystal structure of gcp ii was determined in our laboratory and amino acids arg , asn , lys and tyr were proposed to bind c-terminal glutamate of naag (mesters et al., manuscript in preparation). in the presented study, we carried out site-directed mutagenesis to assess the influence of these amino acid residues on the activity of gcp ii. in addition, glutamic acid in the position which is proposed to be involved in proton shift during the catalytical hydrolysis of peptide bond, was mutated to alanine. all the mutant proteins were expressed in insect cells, purified to near homogeneity and enzymatically characterized. it was shown that a mutation in any of these positions lead to significantly reduced naag-hydrolyzing activity. the substitution of glu almost completely abolished the enzymatic activity, thus suggesting glu is crucial for enzymatic activity of gcp ii. kinetic characterizations of mutant proteins and their substrate specificities will be presented in comparison with wild type gcp ii. comparative study of mammalian homologues of human glutamate carboxypeptidase ii glutamate carboxypeptidase ii (gcpii) is a membrane-bound metallopeptidase. in homo sapiens, gcpii was shown to be expressed in various tissues, mostly in the central nervous system, small intestine and prostate. in brain it hydrolyses n-acetylaspartylglutamate (naag), which is the most prevalent peptide neurotransmitter in the mammalian nervous system, to form glutamate and n-acetylaspartate. in small intestine gcpii plays an important role in folate absorption. in prostate its function is still unknown. it was shown that inhibition of gcpii is neuroprotective in many neurodegenerative states. according to current knowledge of this enzyme, its role may also be important in prostate (and possibly other) cancers, where its expression is dramatically changed in comparison with healthy tissue. gcpii is thus becoming an important therapeutic target and diagnostic molecule. in order to analyze structure-activity relationships in related glutamate carboxypeptidases, we set to study the mammalian homologues of human gcpii: gcpii of rattus norvegicus, sus scrofa and mus musculus, which have approximately % dna sequence similarity to human gcpii. information on the biochemical properties, expression pattern and structural similarity is crucial e.g. for testing of gcpii inhibitors in animal models. we have cloned and expressed recombinant gcpii of r. norvegicus and s. scrofa in insect cells with the aim to obtain pure recombinant protein sufficient for structural analysis. data on biochemical comparison of rat, pig and human gcpii forms will be presented and interpreted in the light of the gcpii structure. structural analysis of pla protein from y. pestis: docking and molecular dynamics of interactions with mammalian plasminogen systemz e. ruback and p. g. pascutti laborato´rio de modelagem e dinaˆmica molecular, departamento de biofisica, universidade federal do rio de janeiro, rio de janeiro, r.j. brazil. e-mail: eruback@biof.ufrj.br the plasminogen (plg) system is an important mechanism for the cell migration through the tissues in the mammalian organisms. some bacterial agents can activate this system by proteases and lead an uncontrolled degradation of extracellular matrix components (mec), and make an invasive character of these infections. the y. pestis protein pla is a plasmid coded outer membrane protein, with aspartic-protease activity and is closely related with the proteolytic activation of plg in the serine-protease form called plasmin. exactly how the pla activate plg in plasmin remains unclear. we performed in this work the predicted interaction between the plg and pla protein by rigid-body docking with hex and evaluate the complex stability by molecular dynamics (md) using the gromacs. to evaluate the docking accuracy we use the crystal structure of complex plg-streptokinase. the md results show more stability in the docked plg-streptokinase complex than in crystal complex observed by the rmsd and rmsf calculations after ns in simulation box. the pla model was constructed with spdb-viewer using the pdb structure of ompt as template and quality of model was evaluated with prochek. the docked complex of plg-pla show same interaction site predicted in mutagenesis studies. after ns md ( atoms in box), we observed the relax of beta barrel structure of pla and the progressive approximation and stabilization between the cleavage site of plg into the extracellular loops of pla, followed of the increase of hydrogen bonds number. in this study we report the possible aminoacids that can be participant in the active site and the sub sites of interaction. the total understanding of these interactions can be a important tool for drug design against bacterial proteases. glutamate carboxypeptidase ii (gcpii), also known as naa-ladase i, folylpolyglutamate hydrolase (folh) or prostate specific membrane antigen (psma) is localized in number of tissues. in brain astrocytes, it regulates neurotransmission by cleaving neurotransmitter n-acetylaspatylglutamate (naag) into n-acetylaspartate and most common excitatory neurotransmitter glutamate. inhibition of gcpii activity protects against cell death after brain stroke. in animal models it has been also shown that specific inhibitors of gcpii could be useful for the treatment of chronic neuropathic pain, amyotrophic lateral sclerosis and other pathologic situations when excess glutamate is neurotoxic. gcpii is identical to prostate-specific membrane antigen (psma), a tumor marker in prostate cancer. gcpii is also found in the membrane brush border of the small intestine where it acts as a folate hydrolase. this reaction expedites intestinal uptake of folate through hydrolysis of folylpoly-gamma-glutamates to monoglutamyl folates. gcpii inhibitors might thus be useful in the imaging and treatment of tumors where folate is required for their growth. therefore it was of interest to investigate whether gcpii might be upregulated in brain tumors as well. in order to analyze this possibility, we took samples from patients with brain tumors treated in faculty hospital motol during - and determined expression and activity of gcpii by western blots and immunohistochemistry using monoclonal and polyclonal antibodies developed against extracellular epitopes of gcpii. moreover, we characterized the enzymatic activity of the enzyme in human samples and correlated the expression of gcpii with the type and grade of the tumor. search for optimal isosteres in beta-secretase peptidic inhibitors alzheimer's disease is a widespread, neurodegenerative, dementia-inducing disorder. it is ascribed to the presence of a lesion in several brain regions, the neuritic plaques, which are extraneuronal accumulations of b-amyloid protein (ab), a -aa insoluble peptide that mixed with axons and dendrites of neurons, interrupt the synaptic process and cause neuronal death. the peptide ab is a product derived by proteolitic cleavage from a larger transmembrane cell protein termed amyloid precursor protein, app. two enzymes are involved in this cleavage: b-secretase and a-secretase. the first one cuts app between met and asp of app to generate the n-terminus of ab in the rate limiting step of the process, while the second one cleaves at various places within a sequence between amino acids and to generate the respective c-terminus. using a combination of molecular modeling techniques, we have designed a set of novel b-secretase peptidic inhibitors with a variety of isosteres starting from the available crystallographic structure of this enzyme bound to the inhibitor om - . some of the resulting ligands are predicted to have higher affinity for this enzyme than the starting compound. these inhibitors have been synthesized, their b-secretase affinity tested and cell essays have been performed to determine their ability to preclude the formation of ab peptides in cell cultures. schizophrenia and bipolar affective disorder (bd) are two neuropsychiatric diseases with high social and economic costs. in spite of the prevalence of these diseases, no effective long-term treatments are currently available. the enzyme prolyl oligopeptidase (pop) shows increased activity in both illnesses. this serine protease hydrolyzes peptide hormones and neuropeptides at the carboxyl end of proline residues. because of the relevance of pop as a therapeutic target, many specific inhibitors of this protein have been developed in recent years. the inhibitors ono- , jtp- and s- - are currently in clinical trial phase. s- - has been administered safely to humans and has been proposed as a potential treatment for cognitive disorders associated with cerebral aging. our aim is to develop new peptide human pop inhibitors. to obtain the human brain pop required for our studies, the cdna corresponding to the enzyme was cloned and subsequently expressed in e. coli. pop activity was monitored by f-nmr using a new synthesized pop substrate labeled with f. this substrate allowed us to perform the inhibition assay avoiding the interference problems of colorimetric and fluorimetric assays and was suitable for high throughput screening of new pop inhibitors. different strategies were used to find putative human pop inhibitors: in silico screening and solid phase synthesis of candidates and screening with chinese medicinal plants extracts. furthermore, nmr studies were performed with the purified human enzyme by labeling the protein isotopically with n and d o and by selective labeling of the residues methionine and tryptophan with c. nmr spectra of the labeled protein were obtained at mhz by applying trosy techniques. nmr will provide structural information to perform structure-based drug design of new pop inhibitors in the future as well as to study the interaction of the candidates with the active site of the enzyme. the crucial regulatory function of the membrane type -matrix metalloproteinase (mt -mmp or mmp- ) in connective tissue metabolism, pericellular proteolysis of extracellular matrix (ecm) components, zymogen activation and angiogenesis was demonstrated with the severe phenotype of the mt -mmp-deficient mice. this membrane-anchored enzyme is not only essential for normal development of hard tissues, but highly expressed in different human cancers where its level frequently correlates with malignant parameters. in most cases the high level of mrna or elevated level of protein can be predictive for disease development but these parameters only partly reflect the expression and forms of mt -mmp in pathological conditions. biosynthesis, trafficking, intracellular activation, internalization, protein-protein interactions, and the level of physiological inhibitors (timps) strictly influence the activity of mt -mmp in cells and tissues. in our experimental system, we followed mt -mmp processing and shedding and characterized the cell-associated and released forms of the enzyme (jbc ; : - ; jbc ; : - and biochem j ; : - ). we found active and inactive truncated forms of mt -mmp as a result of treatments or experimentally generated imbalance with timps. we have also developed approaches to identify mt -mmp forms in tumor tissues. here we present and discuss different strategies to identify mmp- in diverse biological samples. because mt -mmp endows tumor cells with the ability to invade and metastasize, these strategies can provide valuable information on the role and function of this key protease. contribution of calpain to cellular damage in human retinal pigment epithelium cultured with zinc chelator y. tamada , t. nakajima , t. r. shearer and m. azuma , research laboratories, senju pharmaceutical co., ltd., kobe, hyogo japan, departments of integrative biosciences, oregon health & science university, portland, or, usa. e-mail: yoshiyuki-tamada@senju.co.jp purpose: we previously showed involvement of calcium-dependent cysteine proteases (calpains, ec . . . ) in neural retina degeneration induced by hypoxia and ischemia-reperfusion. aged macular degeneration (amd) is one of the leading causes for loss of vision. amd showed degeneration of neural retina due to dysfunction and degeneration of the retinal pigment epithelium (rpe). rpe performs critical functions in neural retina, such as phagocytosis of shed rod outer segments. the purpose of the present study was to determine the contribution of calpain-induced proteolysis to damage in human rpe. zinc chelator tpen was used to induce cellular damage since zinc deficiency is a suspected risk factor for amd. methods: third-to fifth-passage cells from human rpe were cultured with tpen. leakage of ldh into the medium was measured as a marker of rpe cell damage. activity of calpains was assessed by casein zymography, and proteolysis of calpain substrates was detected by immunoblotting. to confirm calpain-induced proteolysis, calpain in homogenized rpe was also activated by addition of calcium. results: tpen caused ldh to leak into the medium from rpe cells, and calpain inhibitor sja inhibited the leakage. casein zymography and immunoblotting for calpain and a-spectrin showed activation of calpain in rpe cultured with tpen. proteolysis by activated calpain was confirmed by addition of calcium to homogenized rpe. conclusion: these results suggested that activation of calpain contributed to rpe damage induced by tpen in vitro. acknowledgments: dr shearer has substantial financial interest (research contract and consulting fee) in senju pharmaceutical co., ltd., and dr azuma is an employee of senju pharmaceutical co., ltd., a company that may have commercial interest in the results of this research and technology. this potential conflict of interest has been reviewed and managed by the ohsu conflict of interest in research committee. in vivo and molecular risk factors of chloroquine or pyrimethamine-sulfadoxine treatment failure in children with acute uncomplicated falciparum malaria the risk factors associated with chloroquine (cq) or pyrimethamine-sulfadoxine (ps) treatment failure were evaluated in children enrolled prospectively in six antimalarial drug trials between july and july in a hyperendemic area of southwestern nigeria. following treatment, ( %) of children given cq and ( %) of children given ps failed treatment by day or . in a multiple regression model, four factors were found to be independent risk factors for cq treatment failure at enrolment: age < years [adjusted odds ratio (aor) = . , % confidence interval (ci) . - . , p = . ], asexual parasitaemia > /ll (aor = . , % ci . - . , p = . ), presence of gametocytaemia (aor = . , % ci . - . , p = . ) and enrolment after years of commencement of the study, that is, after (aor = . , % ci . - . , p = . ). following treatment with cq, two factors were independent risk factors for failure of treatment: delay in parasite clearance > days (aor = . , % ci . - . , p = . ) and presence of gametocytaemia on day or (aor = . , % ci . - . , p = . ). in those treated with ps, two factors were found to be independent risk factors for ps treatment failure at enrolment: age < . years (aor = . , % ci . - . , p = . ) and presence of fever (aor = . , % ci . - . , p = . ). following treatment with ps, delay in parasite clearance > days (aor = . , % ci . - . , p = . ) was an independent risk factor for failure of treatment. the quintuple mutants made up of triple dhfr (asn- , arg- and ile- ) mutant alleles and double dhps (gly- and glu- ) mutant alleles were found in isolates obtained from % of patients, was significantly associated with ps treatment failure (p = . ), while pfcrt and pfmdr- mutant genes did not significantly predict cq treatment failure in these patients. these findings may have implications for malaria control efforts in sub-saharan africa where control of the disease depends almost entirely on antimalarial monotherapy. development of high-throughput assay of lethal factor using native substrate m.-y. yoon department of chemistry, hanyang university, seoul, south korea. e-mail: myyoon@hanyang.ac.kr designing of inhibitors for anthrax lethal factor (lf) is currently of interest as an approach for the treatment of anthrax because lf plays major roles in cytotoxicity of target cells. lf is a zincdependent metalloprotease that specifically cleaves the mitogen-activated protein kinase kinase (mapkk) family. current assay system for the screening of lf inhibitor use the optimized synthetic peptide coupled with various kinds of fluorophores, which enables fast, sensitive, and robust assays suited to high-throughput screening. however, lines of evidence suggest that the regions beside the cleavage site are also involved in specificity and proteolytic activity of lf. in the present study, we tried to develop high-throughput assay for lf activity based on native substrate, mek . the assay system relies on the ecl signal resulting from a specific antibody against the c-terminal region of native substrate. a glutathione-coated multiwell plate was used as a solid support to immobilize the native substrate by its n-terminal gst-moiety. immobilized substrate increases the specificity and sensitivity lf-catalyzed substrate hydrolysis compared to the solution phase assay. this assay system would be expected to discover a wide spectrum of anthrax inhibitor. while significant progress has been made over the past decade in elucidating the structure and enzymatic mechanism of the s proteasome, our understanding of its assembly pathway and the role of the propeptides in the maturation process is still substantially incomplete. similarly, the mechanisms involved in the translocation of substrates into the central nanocompartment are only dimly understood at present. we have used the rhodococcus proteasome to dissect the assembly pathway, combining mutagenesis and crystallographic studies. for the thermoplasma proteasome we have established a ''host-guest'' interaction system which allows us to follow the translocation of specific substrates into the interior of the proteasome by electron microscopy, mass spectroscopy and x-ray crystallography. transferring substrates to the s proteasome in the fission yeast schizosaccharomyces pombe c. gordon mrc human genetics unit, western general hospital, edinburgh, uk. e-mail: colin.gordon@hgu.mrc.ac.uk the ubiquitin pathway is found in all eukaryotes. in this pathway, target proteins are covalently modified by the addition of ubiquitin, a amino acid protein, to specific lysine residues. the ability of multi-ubiquitin chains to function as a signal to target proteins for degradation by the s proteasome is well documented. a key question is how is the multi-ubiquitin chain is recognized as a signal? fission yeast rhp /rad and pus / rpn represent two families of multi-ubiquitin chain binding proteins that can associate with the proteasome as well as some e ubiquitin ligases. they seem to provide a link to shuttle ubiquitinated substrates from the e ubiquitin ligases to the s proteasome. a detailed characterization of their proteasome binding will be presented along with their potential role in ubiquitin conjugate dynamics. finally data will be presented indicating that an additional substrate presentation pathway exists in fission yeast which is also conserved in higher eukaryotes. non-proteasomal rpn raises the threshold for association of a ubiquitin-binding protein with the proteasome the ubiquitin proteasome pathway is responsible for the removal of the vast majority of short-lived proteins in the cell. in order to be degraded, a protein substrate is tagged with polyubiquitin and delivered to the proteasome where it is proteolysed. a slew of shuttle proteins is thought to mediate the delivery of polyubiquitinated substrates, although the mechanism remains elusive. one such family of proteins is comprised of rad , dsk and ddi , which all bind polyubiquitinated substrates through a ubiquitinassociated domain (uba) as well as the proteasome through their ubiquitin-like domain (ubl). another potential shuttle structurally unrelated to the ubl-uba family is rpn . rpn is found as an integral subunit of the proteasome as well as an in an unincorporated pool. we characterized the interactions of these proteins with individual proteasomal subunits, as well as between themselves. we find unique relationships between the putative shuttle proteins and the proteasome, pointing to functional dissimilarity among them. strikingly, unincorporated rpn interferes with binding of dsk to the proteasome. thus, we propose that rpn might play a negative role in proteolysis through its action on dsk . proteins modified by multi-ubiquitin chains are usually targeted for degradation by the proteasome. in other cases, ubiquitylation mediates protein sorting or regulates other functions. a striking example for a non-proteolytic role of ubiquitin is the rad dna damage bypass at stalled replication forks. key elements of this pathway are two ubiquitin-conjugating enzymes, rad and the mms /ubc heterodimer, which are recruited to chromatin by the ring-finger ubiquitin ligases, rad and rad , respectively. moreover, also the sumo-conjugating enzyme ubc is affiliated with the pathway and we discovered that proliferating cell nuclear antigen (pcna), a dna-polymerase sliding clamp involved in dna synthesis and repair, is a substrate. pcna is (i) mono-ubiquitylated by rad /rad , (ii) modified by lysine (k) -linked multi-ubiquitylation, which additionally requires mms /ubc / rad , and (iii) sumoylated by ubc . all three modifications affect the same lysine residue of pcna, indicating that they label pcna for alternative functions. indeed, we discovered that monoubiquitylation of pcna promotes an error-prone replication bypass, whereas k -linked multi ubiquitylation mediates errorfree replication across the lesions. in contrast, sumoylation, which occurs even in the absence of dna damage, prevents recombination between homologs at the replication fork. these findings indicate that mono-ubiquitin, k -linked multi-ubiquitin chains, and sumo are crucial for decision making at the replication fork. ubiquitin-mediated proteolysis is the primary mechanism in eukaryotes for degrading unwanted and misfolded proteins. through the cascade of e , e and e enzymes, ubiquitin monomers are attached sequentially to the target proteins, which are then recognized and degraded by the s proteasome. the selection and specific timing of polyubiquitination of the target proteins are conferred by different e ubiquitin ligases. the anaphase-promoting complex (apc) is one of the most extensively studied e ubiquitin ligases that plays essential role in the cell cycle and specific developmental processes. the core apc is composed of - subunits. except for apc and apc , relatively little is known about the role of the other apc subunits or the assembly of the complex. two wd -repeat activator proteins, cdc and cdh determine stage-specific activation of the core apc as well as selection and binding of the apc substrates. in plants, the apc activators are present in multiple copies. arabidopsis contains cdc genes, cdh -type activators known as ccs a , ccs a and ccs b. our work has been focused on the function of apc activators in the cell cycle and plant development, identification of novel apc substrates and on the assembly of the apc complexes. apc activities, based on the expression profiles of the cdc and ccs genes, will be presented at organism level. by detailed protein interaction studies in yeast two hybrid system and arabidopsis protoplasts or transgenic plants, we shall demonstrate how the core apc interacts with the activators and substrates, and propose a model for apc assembly. characterization of substrate delivery to the saccharomyces cerevisiae proteasome by quantitative shotgun proteomics the proteasome is the central protein degradation machinery in the eucaryotic cell. in conjunction with the ubiquitin system, it is responsible for constitutive bulk protein turnover as well as the controlled degradation of regulatory proteins. the system is very well characterized, but the mechanism by which poly-ubiquitinated substrates are delivered to the proteasome remains unclear. recently our lab has proposed a number of proteins to be proteasome-based receptors for poly-ubiquitinated substrates in s. cerevisiae (rpn p, rad p, dsk p; verma et al., ). others (e.g. richly et al. ) have put forward a complex model for the delivery of substrates from the ubiquitinating machinery to the proteasome involving the aaa atpase cdc p. by analyzing the composition of affinity purified proteasome complexes from s. cerevisiae cells lacking these factors and/or exposed to specific proteasome inhibition, we hope to further elucidate the substrate delivery pathway. ubiquitinated proteins recruited to the proteasome are identified utilizing capillary chromatography in-line to electrospray ion trap mass spectrometry (mudpit; link et al. ). using a reference strain grown in minimal medium solely providing heavy nitrogen ( n) as an internal standard, we are able to record even gradual fluctuations in sample composition. differences in the recruitment of substrates to the proteasome in varying mutant backgrounds will shed light on the specificity of proteasome substrate receptors and the topology of the substrate delivery mechanism. oxidative protein damage by reactive oxygen species (ros) produces cross-linking, fragmentation and biochemical modification of the amino acids resulting in biological dysfunctions. quercetin, a widely distributed bioactive plant flavonoid, possesses anti-cancer, antioxidants and free radical scavenging activities, as well as it binds with dna causing dna fragmentation. a little is known about protein oxidative damage and its modifications by antioxidants. therefore, the aim of the present work was to investigate the molecular mechanisms of antioxidant and prooxidant activities of quercetin toward proteins. the antioxidant activities of quercetin, such as superoxide dismutase (sod)-and catalase (cat)-mimetic as well as hydroxyl radical (aeoh) scavenging activities were possessed. bovine serum albumin (bsa) was incubated with different concentrations of quercetin. quercetin has highly sod-and cat-like and hydroxyl radical (aeoh) scavenging activities. its activities are concentration dependent. quercetin fragmentized bsa into specific fragments which they detected by sds/polyacrylamide gel electrophoresis. oxidative protein damage was assessed as tryptophan oxidation, carbonyl, quenone and advanced oxidation protein products (aopp) generation. the increase of protein oxidation products was in concentration dependent manner. the carbonyl and quenone contents and aopp were highly significantly elevated in querce-tin-treated proteins when compared with the control sample. the tryptophan fluorescence was highly decreased in treated protein than in the control sample. the mechanisms of antioxidant and pro-oxidant activities of quercetin have been discussed. these results demonstrate that antioxidant quercetin may potentiate protein damage via oxygen free radical generation, particularly .oh radicals by quercetin. protein stability mediated by a hyaluronanbinding deubiquitinating enzyme is involved in cell viability protein degradation by the ubiquitin system plays a crucial role in numerous cellular signaling pathways. deubiquitination, a reversal of ubiquitination, has been recognized as an important regulatory step in the ubiquitin-dependent degradation pathway. we have identified three novel genes encoding a deubiquitinating enzyme, vdub , vdub , and vdub (villi deubiquitinating enzyme , , and ) from human chorionic villi by rt-pcr. their cdnas are , bp in length and encode an open-reading frame of amino acids with a molecular weight of approximately kda. expression analysis showed that vdub transcripts are highly expressed in the heart, liver, and pancreas. in addition, they are expressed in various human cancerous cell lines. amino acid sequence analysis revealed that they contain the highly conserved cys, his, and asp domains, which are required for the formation of active site for the deubiquitinating enzymes. in vivo and in vitro deubiquitinating enzyme assays indicated that vdub , vdub , and vdub have deubiquitinating enzyme activity. here, we show that the overexpression of vdub proteins leads to irregular nuclear morphology and apoptosis, suggesting that these vdubs play an important role in regulating signal transduction involved in cell death. interestingly, the sequence analysis showed that vdub proteins contain the putative hyaluronan/mrna-binding motifs, and cetylpyridinium chloride-precipitation analysis confirmed the association between vdubs and intracellular hyaluronan and rna. chemical cleavage of peptide (amide) bonds usually requires harsh conditions. as a result of side reactions and the lack of specificity, chemical amide bond hydrolysis is not a preferred means of protein digestion. we have discovered selective cleavage of peptide bonds in proteins under milder circumstances than any previously reported chemical method. hydrolysis takes place in aqueous buffers in a ph range of , and occurs c-terminal to the proteogenic non-natural amino acid azido-homoalanine (azhal), effected by a staudinger reaction after addition of the mild and biocompatible reagent tris(carboxyethyl)phosphine (tcep). key feature in the suggested reaction mechanism is the unprecedented nucleophilic substitution of the resulting gammaiminophosphorane by the flanking c-terminal backbone amide oxygen atom. after hydrolysis, the new c-terminal peptide is present as a homoserine lactone residue and the n-terminal peptide as its free amine. this new reaction may find application as a very mild and selective bio-orthogonal degradation pathway in biochemistry and biomaterials science. overexpression of proteasome b subunit increases amount of assembled proteasome and confers ameliorated response to oxidative stress and higher survival rates the proteasome is the major cellular proteolytic machinery responsible for the degradation of both normal and damaged proteins. proteasomes play a fundamental role in retaining cellular homeostasis. alterations of proteasome function have been recorded in various biological phenomena including aging. we have recently shown that the decrease in proteasome activity in senescent human fibroblasts relates to the down-regulation of btype subunits. in this study we have followed our preliminary observation by developing and further characterizing a number of different human cell lines overexpressing the b subunit. stable overexpression of the b subunit in wi /t and hl cells resulted in elevated levels of other b-type subunits and increased levels of all three proteasome activities. immunoprecipitation experiments have shown increased levels of assembled proteasomes in stable clones. analysis by gel filtration has revealed that the recorded higher level of proteasome assembly is directly linked to the efficient integration of ''free''/not integrated b-type subunits identified to accumulate in vector-transfected cells. in support we have also found low pomp levels in b transfectants thus revealing an increased rate/level of proteasome assembly in these cells as opposed to vector-transfected cells. functional studies have shown that b overexpressing cell lines confer enhanced survival following treatment with various oxidants. moreover we demonstrate that this increased rate of survival is due to higher degradation rates following oxidative stress. finally, as oxidation is considered to be a major factor that contributes to aging and senescence, we have overexpressed the b subunit into primary imr human fibroblasts and we have observed a delay of senescence by population doublings. in summary, these data demonstrate the phenotypic effects following genetic up-regulation of the proteasome and provide insights towards a better understanding of proteasome regulation. expression levels of the components of the ubiquitin/proteasome pathway in pisum sativum seedlings under anoxia stress change in gene expression: proteins produced under aerobic conditions are no longer synthesized and are replaced by the socalled anaerobic peptides. among those proteins synthesized under o deficiency some enzymes of the glycolytic and fermentative pathways were identified in plants. upon reintroduction of air, the anaerobic mrnas disappear rapidly and the increased levels of those enzymes must return to the basal levels. the ubiquitin/proteasome system is a major pathway of proteolysis in eukaryotic cells and may contribute to controlling the intracellular levels of a variety of short-lived regulatory proteins. in this proteolytic pathway, proteins are covalently conjugated to ubiquitin, which flags them for rapid hydrolysis by the s proteasome. long polyubiquitin chains must be formed to target a protein for destruction by the proteasome. in plants, the ubiquitin-mediated proteolytic pathway is implicated in a variety of cellular processes, including stress responses. in this study, -dayold pisum sativum seedlings were subjected to: (i) h of anoxia stress; (ii) h of aerobic conditions after h of anoxia stress and (iii) h of aerobic conditions after h of anoxia stress. the levels of free and conjugated ubiquitin were detected by immunoblotting using anti-ubiquitin polyclonal antibodies. the changes in the mrna levels of some components of the ubiquitin/proteasome pathway in the seedlings were determined by relative semiquantitative rt-pcr. the results suggest an involvement of the ubiquitin-mediated proteolytic pathway in the anoxia stress response. b - p involvement of the anaphase promoting complex in plant development controlled degradation of short-live proteins via ubiquitindependent proteolysis by the s proteasome is a key mechanism in eukaryotes that regulates nearly all fundamental cellular processes including cell cycle. polyubiquitination of the protein substrate is sufficient to target it for degradation by a large atp-dependent multicatalytic protease, the s proteasome. the selection and specific timing of ubiquitination of the target proteins are conferred by different e ubiquitin ligase. the anaphase promoting complex (apc) is one of the e ubiquitin ligases, which by ordered destruction of various cell cycle proteins has fundamental roles in the regulation of mitotic and endoreproduplication cycles. the apc functions also outside the cell cycle. in post-mitotic cells, the cdh adaptor protein ensures stage specific activation and substrate selection of the apc. in plants, two classes of the cdh -type activators have been identified, ccs a and ccs b that display differential regulation during the cell cycle and plant development as well as differences in their substrate-specificities. in arabidopsis, transient and complimentary expression profiles of the atccs a , atccs a and atccs b genes indicate apc functions during flower development. to identify apc targets, yeast two hybrid screens were performed in the laboratory. out of about interacting proteins, several proteins were transcription factors including a key a regulator of flowers development. data on the interactions of the ccs proteins and transcription factors in arabidopsis protoplasts will be presented as well as a model for the apc regulated pathways. novel effects of ubiquitin system and chaperone proteins on the prion ''life cycle'' in yeast t. a. chernova , k. d. allen , e. p. tennant , k. d. wilkinson and y. o. chernoff department of biochemistry, emory university, atlanta, ga, usa, school of biology and institute for bioengineering and bioscience, georgia institute of technology, atlanta, ga, usa. e-mail: tcherno@emory.edu yeast prion [psi + ], the self-propagated aggregated isoform of the translation termination factor sup , is used as a model system to study neural inclusion disorders. prion aggregates and other neural inclusions in mammals were previously reported to sequester ubiquitin (ub). proteasome inhibitors affected the turnover of mammalian prion proteins. however, a role of ub-dependent proteolysis in the prion ''life cycle'' has not been clearly defined. chaperone proteins, which are also implicated in ub-dependent proteolysis, have been shown to influence the formation and propagation of the prion aggregates. our results uncover the connection between alterations of ub system and chaperone proteins in their effects on the maintenance of yeast prion. we have demonstrated that deletions of genes encoding deubiquitinating enzymes, that are critical for ub regeneration at the proteasome (ubp ) or the vacuole (doa ), cause pleiotropic phenotypic effects that are primarily due to decreased levels of free ub in the yeast cells. these alterations, as well as deletion of the gene encoding ub-conjugating enzyme, ubc , decreases [psi + ] curing by the overproduced disaggregase hsp , suggesting that ub system influences hsp -dependent clearance of prion aggregates. spontaneous [psi + ] formation was also increased in the ubc depleted cells. we previously demonstrated that excess of cytosolic chaperone ssa of hsp family increases de novo formation of [psi + ]. both in vivo and in vitro experiments uncover direct interactions between sup and hsp proteins. the amount of sup -bound to hsp -ssa was increased in ubc deletion strain. we propose a model to explain roles of hsp , hsp and ub system in the prion life cycle. effects of parkinson''s disease mimetics on proteasome activity and protein turnover in human sh-sy y neuroblastoma cells it has recently been suggested that impairment of the ubiquitin/ proteasomal system contributes to the degeneration of dopaminergic neurons (dn) and lewy body (lb) formation in parkinson's disease (pd). mitochondrial dysfunction is also a key factor in pd and agents such as mpp + and dopamine, which inhibit mitochondrial electron transport, produce selective degeneration of dn in animal models. in this study the effects of treating sh-sy y cells with mpp + or dopamine over h on proteasomal chymotrypsin-like activity (cla) was monitored. mpp + ( . mm) caused a sustained depletion of glutathione levels followed by a reduction in proteasomal activity. a reduction in atp levels, caused by higher levels of mpp + ( mm), exacerbated this effect. exposure to low dopamine concentrations ( . mm) led to large reductions in atp without affecting cla or glutathione levels; whilst higher concentrations ( mm) caused marked reductions in cla, glutathione and atp levels. these results suggest that, under oxidative stress, glutathione levels are important regulators of proteasomal activity in this cell line. our group has shown that mpp + can destabilize the neurofilament network in shsy- y cells, partly due to changes in phosphorylation of neurofilament (nf) chains. as nfs are important components of lbs, and their mode of turnover is uncertain, we tested the effects of proteasome inhibitors on nf levels. treatment with these inhibitors led to nf accumulation, which was enhanced when glutathione levels were artificially depleted, suggesting that nfs can be degraded via the proteasomal pathway. the effects of proteasome impairment on protein accumulation will be discussed. mitochondria and the hypoxia-inducible factor (hif- ): regulation of hif- is independent of a functional mitochondrial respiratory chain k. doege, w. jelkmann and e. metzen insitute of physiology, university of luebeck, luebeck, germany. e-mail: doege@physio.uni-luebeck.de the hypoxia-inducible factor hif- is the ''master-regulator'' in adaptation to low oxygen concentration and induces the hypoxic expression of several target genes, e.g. erythropoietin and vascular endothelial growth factor (vegf). in normoxia hif- a is constantly produced but also degraded by oxygen-dependent prolyl-hydroxylation. mitochondria consume most of the oxygen delivered to cells and have been implicated in oxygen sensing. firstly, mitochondria have been proposed to stabilize hif- a by production of reactive oxygen species (ros) in hypoxia. secondly, inhibition of the respiratory chain, e.g. by nitric oxide, has been proposed to cause redistribution of intracellular oxygen followed by reactivation of the prolyl hydroxylases and inhibition of hif signalling. we have used cells depleted of mitochondrial dna (q ) and gas permeable cell culture dishes to eliminate all oxygen diffusion gradients affecting the cells. we show that these dishes neutralize all effects of mitochondrial inhibition. additionally, cellular hypoxia as assessed by pimonidazole staining has been evaluated in human osteosarcoma cells treated with inhibitors of the respiratory chain under hypoxia. these results demonstrate an elevated po under hypoxic conditions after treatment with mitochondrial inhibitors correlating with an intracellular oxygen concentration which reduces hif- activation. thus, neither the absence of ros nor the redistribution of intracellular oxygen supply leads to the destabilization of hif- a in hypoxia. our experiments provide evidence that an increased intracellular po evoked by the absence of mitochondrial oxygen consumption reactivates the prolylhydroxylases and is therefore responsible for the degradation of hif- a under hypoxic conditions. enzyme activity is generally higher in rhizosphere than in bulk soil, as a result of a greater microbial activity sustained by toot exudates or due to the release of enzymes from roots. negative effects of heavy metals on soil microorganisms and enzyme activities have been long recognized. the aim of this study was to assess the stimulatory effects of different low molecular weight organic compounds commonly present in root exudates (mres) on microbial activity and protease activities and , and how high cd concentrations affect such stimulatory effects. soils (arenic udifluvent) were sampled from the agir long-term field trials, contaminated with cd nitrate at rates of (control soil), and mg cd per kg of soil. the mre solutions contained glucose, citric acid, oxalic acid, glutamic acid or a mixture of the four compounds, added to give a rate of mg of mre-c per kg of soil. the effects were measured at mm (bulk soil) distance from the mrs. protease activity was determined by hydrolysis of n-benzoylargininamide (baa). the results showed that different mres had different stimulatory effects on microbial growth and on the protease activities, mostly localized in the rhizosphere soil layer. in the control soil, the dsdna content was significantly increased by the addition of all mre in both rhizosphere and bulk soil layers. the and mg cd per kg of soil negatively affected on protease activity. the glucose, citric acid, oxalic acid, glutamic acid, mres mix in both rhizosphere and bulk soil layers, did not stimulate protease. the, microbial growth and protease activities were drastically reduced by high cd concentrations. participation of different digestive proteinases of the yellow mealworm, tenebrio molitor, in initial stages of hydrolysis of the main dietary protein insects generally have a wide spectrum of digestive proteinases. the knowledge about the impact of different proteinases to initial stages of hydrolysis of dietary proteins is essential for insect control by means of proteinase inhibitors and bacillus thuringiensis toxins. the larvae of a stored grain pest yellow mealworm, tenebrio molitor, were reared on milled oat flakes. the main dietary protein for these larvae was s globulin, the main storage protein of oat seeds. to study the initial stages of s globulin hydrolysis in vitro the reaction was performed in the physiological conditions of anterior midgut (am) (ph . ) by purified enzyme preparations from am: two fractions of cysteine proteinases cys ii and cys iii, chymotrypsin-and trypsin-like proteinases. total hydrolysis of s globulin was observed with cys ii. slightly less effective was hydrolysis by chymotrypsin-like enzyme. cys iii cysteine and trypsin-like proteinases produced only partial hydrolysis of seed globulin. in all cases high molecular mass (mm) intermediate products were formed testifying that hydrolysis of s globulin was sequential. incubation with both cysteine proteinase fractions led to formation of kda product, while serine proteinases pro- in contrast to ''classical'' bioregulator peptides, peptides could be generated in the course of catabolic degradation of functional proteins. for years, we have been interested in such particular group of peptides derived from blood hemoglobin, hemorphins. hemorphins consist in a family of opioid receptor-binding peptides from to amino acids that are released by proteolytic processing from the ( - ) segment of human hemoglobin betachain. they are prevalent throughout the peripheral and central nervous system and have been isolated in vivo from tissues or fluids. many in vivo physiological effects have been related (coronaro-constrictory, anti-tumorous, immunoregulatory activities) and several of the hemorphins interact at various levels of the reninangiotensin system (ras) by inhibiting angiotensin-convertingenzyme (ace), aminopeptidase n (apn) and dipeptidyl peptidase iv (dppiv) activities. in addition, some hemorphins and in particular lvv-hemorphin- (lvvypwtqrf), binds with high affinity to the brain (ic = . nm) and renal at angiotensin receptor subtype and is possible the main endogenous ligand from this receptor. in an attempt to characterize in vivo precise mechanisms for their release, our attention is focused towards tumoral and central nervous system environments. the last one is particularly interesting as all cellular components implicated in the release of hemorphins are present simultaneously: the haemoglobin precursor and localized brain proteases which might come in contact with blood haemoglobin. in this purpose, the examination of potentiality for this tissue to generate ''neuro''-hemorphins would be of interest since sources of hemorphins in the brain have not yet been definitively established. later, we showed that sgt interacts with calcyclin (s a ) and other calcium-binding proteins of the s family (nowotny et al. j biol chem ). moreover, in collaboration with dr chazin's group, we found that in vitro sgt binds to hsp (lee y-t et al. j biol chem ). in this work we studied the expression and subcellular localization of sgt in mammalian cells by means of western and northern blots. among different cell lines examined human embryonic kidney hek and human glioma t g cells exhibit highest expression of sgt protein. moreover, we found that in mouse and rat cells there is one isoform of sgt , while in human cells two isoforms of this protein were found. to study the subcellular localization of sgt we chose the cells containing moderate level of sgt such as human epidermal hep- cells. by applying immunocytochemistry we found that this protein is present not only in the cytoplasm but also in the nucleus. at present we check the effect of intracellular ca + concentration on subcellular localization of sgt and on its co-localization with target proteins. acknowledgements: this work was supported by grants: kbn p a and firca/nih r tw . combining reverse genetics, reverse chemogenomics and proteomics to assess the impact of protein n-terminal methionine excision in the cytosol of higher eukaryotes in living organisms whatever the cell compartment, proteins are always synthesized with methionine (met) as the first residue. however, this first met is specifically removed from most mature proteins. in the course of protein n terminal met excision (nme), the free n terminal met is removed by met aminopeptidase (map) cleavage. three enzymes (map a, map a and map b) have been identified in the cytoplasm of arabidopsis thaliana. by combining reverse genetics and reverse chemogenomics in transgenic plant lines, we have devised specific and reversible switches for the investigation of the role of cytoplasmic nme in a. thaliana and of the respective contributions of the two types of cytoplasmic map throughout development. in the map a ko context (map a- ), modulating map activity by treatment with various concentrations of the specific drug fumagillin impaired plant development. hence, (i) cytoplasmic nme is essential in plants, (ii) plant map a and map s are functionally interchangeable as a complete block of either map type activity does not cause any visible molecular or phenotypic effect, (iii) a minimal level of cytoplasmic map is required for normal development and (iv) the plant a. thaliana appears an excellent system to study nme and the associated-role of anti-cancer agents like fumagillin. proteomics was used to assess the impact of nme blocking induced by fumagillin. we used a wild-type plant and the map a- variant grown in the presence of nm fumagillin. the map a- variant showed a dwarf phenotype. we compared by d gel electrophoresis the patterns of each protein extracts. protein spots were identified by tandem mass spectrometry. the data show that fumagillin induces many dedicated pathways, with a prevalence of those related to oxidative stress. prolyl endopeptidases from the midgut of the yellow mealworm tenebrio molitor side of proline residues. these enzymes were found in mammals, several higher plants, fungi and bacteria. it is suggested that the enzymes participate in the in vivo regulation of the action of biologically active peptides. we for the first time report about two prolyl endopeptidases in the larval midgut of a stored product pest yellow mealworm tenebrio molitor where they can participate in the proteolysis of one of the main dietary proteins of t. molitor larvae -rich in proline prolamines. characteristics of two prolyl endopeptidases are significantly different. optimum for hydrolysis of the substrate z-ala-ala-pro-pna (n-carbobenzoxy-l-alanyl-l-alanyl-l-prolyl-p-nitroanilide) by prolyl endopeptidase was at ph . , and prolyl endopeptidase -at ph . . prolyl endopeptidase displayed high phstability in the ph range . - . and the rate of hydrolysis increased in the presence of kcl and cacl . prolyl endopeptidase demonstrated low stability in the whole ph range, the rate of hydrolysis strongly decreased in the presence of above mentioned salts, but increased in the presence of high concentrations of edta. the influence of cell growth media on the stability and antitumour activity of methionine enkephalin studies with cultured tumour cell lines are widely used in vitro to evaluate peptide-induced cytotoxicity as well as molecular and biochemical interactions. the objectives of this study were to investigate the influence of the cell culture medium on peptide metabolic stability and in vitro antitumour activity. the degradation kinetics of the model peptide methionine enkephalin (met-e, tyr-gly-gly-phe-met), demonstrated recently to play an important role in the rate of proliferation of tumour cells in vitro and in vivo, were investigated in cell culture systems containing different amounts of foetal bovine serum (fbs). the influence of enzyme inhibitors (bestatin, captopril, thiorphan) on the met-e degradation was also investigated. the results obtained in the dulbecco's modified eagle medium containing % fbs indicated a rapid degradation of met-e (t / = . h). pre-incubation of the medium with a mixture of peptidase inhibitors reduced the hydrolysis of met-e, as shown by increased half-life to h. the in vitro activity of met-e against poorly differentiated cells from lymph node metastasis of colon carcinoma (sw ) and human larynx carcinoma (hep- ) cells was determined. tumour cells were grown weeks prior to the experiment in a medium supplemented with , or % fbs. statistically significant to mild or no suppression of cell proliferation was observed in all cultures. in both cell lines, a significant suppression of cell growth by a combination of peptidase inhibitors and met-e, compared with cells exposed to the peptide alone and cells grown in the absence of met-e, was observed. this study indicated that caution must be exercised in interpreting the antiproliferative effects of peptide compounds in conventional drug-response assays. protein metabolism in whole body and skeletal muscle of laboratory rats treated by proteasome inhibitors proteasome inhibitors are new agents which may be used in treatment of cancer and other severe disorders. one of the possible side effects of their administration is disturbance in protein metabolism which may affect outcome of the illness. two separate studies were performed using wistar rats. in the first study, m. soleus (sol) or m. extensor digitorum longus (edl) were incubated in medium containing mmol/l mg or mmol/l adaahx l vs or without inhibitor (control). protein synthesis was evaluated using l-[ - c]leucine. proteolysis was determined according to the rate of the tyrosine release into the medium during incubation. in the second study, proteasome inhibitor mg diluted in dimethyl sulfoxide (dms) was administered intraperitoneally in dose mg/kg b.w. controls consisted of dms treated animals. changes in protein and amino acid metabolism were estimated in steady-state conditions using continuous infusion of l-[ - c]leucine h later. mann-whitney (in vivo study) and paired t-test (in vitro study) were used for statistical analysis. in in vitro study, both mg and adaahx l vs significantly decreased protein synthesis and proteolysis. however, in in vivo study, a significant increase in whole-body protein synthesis and proteolysis were observed in mg treated animals. acknowledgements: the study was supported by a grant of gacr no. / / . bioinformatical evidence for a prokaryotic ubiquitin-like protein modification system h. scheel, s. tomiuk and k. hofmann bioinformatics group, memorec biotec gmbh, ko¨ln, germany. e-mail: kay.hofmann@memorec.com until recently, the ubiquitin system has been considered a purely eukaryotic invention. by now, the bacterial moad/moeb and this/thif systems are known to be prokaryotic versions of a rudimentary activation system for ubiquitin-like proteins. however, similarities to the ubiquitin system end after the activation step, as moad and this are not conjugated onto target proteins but rather have a role in the biosynthesis of molybdopterin and thiamin, respectively. the eukaryotic protein urm is the closest homolog of moad and this. unlike its bacterial cousins, urm is conjugated onto target proteins and thus can be considered the founding member of the diverse eukaryotic ubiquitin family. by using a bioinformatics approach that integrates methods of sequence analysis, phylogenetics, phylogenomics and gene-order analysis, we were able to show that many bacteria possess a third ubiquitin-like activation system that most likely is used for protein modifications. the novel system uses a moad/this relative, which is more closely related to urm than the typical moad and this proteins. these bacterial urm (burm ) proteins typically require the proteolytic removal of a c-terminal extension, which masks the gg motif important for activation. many burm operons contain a mpn+/jamm domain protein (belonging to a bona fide ubiquitin-specific protease family), which is most likely responsible for this cleavage. as a third component, an e -like enzyme is also part of typical burm operons. the burm -associated e enzymes look more like uba (the eukaryotic urm -e ) than like the bacterial moeb/thif e enzymes. interestingly, the mpn+/jamm protease is also conserved in those bacteria whose burm end with gg, suggesting that burm removal is important not only for the activation step b - p non-hypoxic induction of hypoxia-inducible factors by insulin and -deoxy-d-glucose hypoxia-inducible factors (hifs) are key mediators of the cellular adaptation to hypoxia, but also respond to non-hypoxic stimuli like insulin. to clarify involvement of all known hif subtypes in conditions resembling diabetes, we determined distribution of mrnas and proteins in rats subjected to in vivo hypoglycemia and glucoprivation. wistar rats were infused with either saline, insulin, or -deoxy-d-glucose ( -dg) to provoke hypoglycemia or impaired glucose assimilation. using real-time qpcr, mrna levels of hif subunits a, a, a, b, and of the target gene glut- were determined in various organs. cellular distributions of hif-a proteins were examined by immunohistochemistry. treatments with insulin or -dg resulted in a widespread increase in hif- a mrna after h, whereas mrna expression of other hif subunits remained unaffected, except for hif- a which increased in lung and heart after -dg. in cerebral cortex and kidney, enhanced staining of all hif-a proteins was observed after insulin or -dg treatments. lung, heart and kidney showed enhanced levels of glut- mrna. both hypoglycemia and glucoprivation provoke functional activation of the hif system, with transcriptional up-regulation of hif- a representing a typical response. our data indicate an involvement of the hif system, and hif- a in particular, in the pathophysiology of diabetes. fragments of human salivary statherin and pb peptide underlying a furin-like pro-protein convertase action in the pre-secretory salivary fragmentation pathway the recent analysis of some derivatives of human salivary peptides and proteins [ ], such as acidic and basic proline-rich proteins (prp) and histatins, allowed recognizing in the presecretory salivary fragmentation pathway the action of a furinlike pro-protein convertase of the kexin-subtilisin family, often followed by a carboxy-peptidase action. on the same line, the present study was carried out to search in human saliva the fragments generated from statherin and pb peptide by the action of furin-like proteinases, utilizing a selected-ion monitoring strategy based on hplc-it ms. the fragments and post-translational derivatives detected with high frequency in multiple samples were the following: (i) statherin ( amu), des-phe- ( amu), des-thr- -phe- ( amu), des-asp- ( amu), mono-phosphor. ( amu), statherin sv (missing - residues; amu), fragm. - ( amu), fragm. - ( amu), fragm. - ( amu). moreover, the fragm. - ( amu) of pb peptide ( amu) was identified. the quantity of these fragments in salivary samples was usually < % of the parent peptide. the identified fragments confirmed the action of a proprotein convertase on furin-like consensus sequences, being the cleavage at arg- (ekflr), arg- (lrr) and arg- (rrigr) for statherin, and at arg- (rgpr) for pb peptide. detection of statherin missing n-and c-terminus residues indicated also a pre-secretory exopeptidase action, already observed in other salivary peptides. the function of these statherin and pb derivatives in the oral cavity must be elucidated. cloning and expression of a pepstatin insensitive acid protease from thermoplasma volcanium in e. coli acid proteases, commonly known as aspartic proteases, are recognized by their specific inhibition by pepstatin. acid proteases are found in microorganisms both as intracellular and extracellular enzymes. there is very limited number of thermostable, pepstatin insensitive acid proteases isolated from bacterial sources. the only example of purified and cloned acid protease from archaebacteria is thermopsin, produced by sulfolobus acidocaldarius. this thermophilic enzyme represents a new class of acid proteases. to extend our knowledge on the microbial acid proteases with thermostable properties, in this study we have undertaken the cloning and expression of a thermostable, pepstatin insensitive acid protease from themoacidophilic archaeon thermoplasma volcanium. a primer set was designed based on nucleotid sequence of the predicted thermopsin gene and pcr amplification produced a bp fragment, which covered complete thermopsin gene with some upstream and downstream sequences. the amplified thermopsin gene was cloned in e. coli, using pdrive vector. the alignment of the amino acid sequences of thermopsins from various archaea revealed the highest homology ( %) between the tp. volcanium thermopsin and putative tp. acidophilum enzyme, thermopsin . there was a low degree of similarity ( %) between the tp. volcanium thermopsin and thermopsin from sulfolobus acidocaldarius. expression of the recombinant thermopsin was attempted using qia expression kit, where the cloned gene was ligated to pqe expression vectors to be expressed under the control of t promoter. in this system the protein was tagged with xhis residue at n-terminal end so that it could be selectively isolated using ni-nta metal-affinity chromatography. include various vital proteins with discrete functions in the propagation of apoptosis. our aim is to generate a caspase cleavage site predictor specific for each member of the caspase family in order to make subtype-specific predictions of new caspase substrates. we have used a set of experimentally verified proteins to generate sequence logos and train a neural network in order to predict caspase cleavage sites. machine learning techniques, such as artificial neural networks, are often well suited to integrate the subtleties of sequence variations. this approach also enables integration of structural information in the pattern recognition procedure which could possibly increase the predictive performance of the neural network. the identification of new caspase substrates can lead to further elucidation of several cellular processes involving caspases, including apoptosis, cell cycle regulation, cellular differentiation, and pro-inflammatory responses. in addition, the generation of caspase inhibitors could be greatly aided by a caspase cleavage site predictor. regulation of protein synthesis and autophagic-lyososomal protein degradation in isolated pancreatic acini a. l. kovacs and e. papp cell physiology laboratory, department of general zoology, eotvos lorand university, budapest, hungary. e-mail: alkova@cerberus.elte.hu a series of biologically active compounds (wortmannin, ly , -methyladenine, rapamycin, okadaic acid, theophyllin, insulin, glucagon, cholecystokinin) influencing protein synthesis and autophagic-lysosomal protein degradation by interfering with important signalization pathways were investigated. our results show that in exocrine pancreas cells phosphatidyl inositolkinases (pi k-s) are activators, while the target of rapamycin protein (tor) is an inhibitor of autophagy. camp is an inhibitor of lysosomal protein degradation that acts through members of the pi k family. okadaic acid inhibits lyososomal protein degradation without inhibiting the formation of autophagic vacuoles. the inhibition of pi k-s and tor diminishes protein synthesis, inhibitors of these kinases reduce the synthesis stimulatory effect of insulin. cholecystokinin showed a biphasic stimulatory effect while glucagon was ineffective on protein synthesis. on the base of these results a possible signalization pathway is suggested for autophagic segregation and lysosomal protein degradation in pancreatic acinar cells. purification and characterization of a bifunctional protease from vibrio vulnificus in this study, we purified and characterized an extracellular protease showing dual functions as prothrombin activator and fibrinolytic enzyme from vibrio vulnificus atcc . the purified enzyme had broad substrate specificity towards various bloodclotting associated proteins such as prothrombin, plasminogen, fibrinogen and factor xa. the cleavage of these proteins could be stimulated by addition of mm mn + . the protease could acti-vate prothrombin to active thrombin. however, the thrombin activity generated from prothrombin activation by the protease seemed to be transient, with further cleavage resulting in a loss of activity. interestingly, the enzyme could enhance the activity of thrombin during the initial rate of fibrin formation when purified fibrinogen was used as substrate. it could also actively digest fibrin polymer as well as cross-linked fibrin. these results suggest that the secreted protease functions as a prothrombin activator and a fibrinolytic enzyme to interfere with blood clotting as part of the mechanism associated with its pathogenicity in human. tumor invasion and metastasis are the major causes of treatment failure and death in cancer patients. one requisite for neoplastic cell invasion during tumorigenic processes is the remodeling events that occur within the stroma or extracellular matrix (ecm). cysteine cathepsins, most likely along with matrix metalloproteases and serine proteases, degradate the ecm, thereby facilitating growth and invasion into surrounding tissue and vasculature. clinically, the activity levels and localization of cysteine cathepsins and their endogenous inhibitors have been shown to be of diagnostic and prognostic value. the aim of our study was therefore both the determination of prognostic and diagnostic impact of cathepsins b, l and h from human tissues extracts (normal and tumor tissue) and extracellular fluids (such as plasma and urine) and a -proteinase inhibitor (pi) in pathogenesis of different types of human brain tumors, and extraction and purification of cysteine cathepsin endogenous inhibitors from normal and tumor brains and studying of their physicochemical properties. it was found that the increasing of cysteine cathepsins b, l, h activity levels in brain tumors tissues depend on histostructure, histogenesis and tumor malignancy grade. increasing of cathepsins l and h activity levels was found in plasma and urine in depending on histogenesis. at the same time decrease in pi activity level was registered. besides, kinetic characteristics of extracted normal brain endogenous inhibitors of cysteine cathepsins were determined. in extracted tumor brain endogenous inhibitors, there were differences in physicochemical properties in comparison with normal. the data obtained contribute to understanding the participation of cysteine cathepsins and their inhibitors in mechanisms of cancer genesis and both become useful for solving the problem of improving of tumor therapy and provide the possibility of using their activity as diagnostic and prognostic markers. protein hydrolysates of sea origin as components for microbiological culture media dry hydrolysate was prepared from protein-containing waste of icelandic scallop chlamys islandicus processing (spw) by means of a proteinase complex from king red crabs hepatopancreas. the enzyme consist of the proteolytic enzyme complex from crab hepatopancreas, in which serine proteases dominate (collagenase, elastase and trypsin-and chymotripsin-like proteinases). as proteinases from king red crab hepatopancreas have high enzymesubstrate affinity to icelandic scallop proteins, a high degree of proteolysis can be achieved. the composition and properties of the material were investigated on enzymatic protein hydrolysate from spw obtained under the most technologically suitable conditions: - o c, ph . , h, the ratio between the protein material and the enzyme preparation being : . for comparison we examined the composition of commercial pancreatic hydrolysate from poor-quality fish species, mainly boreogadus and micromestistus. it was found that hydrolysate from spw significantly overpowered the commercial analog in the mass percentage of the target product (free amino acid and oligopeptides). the resulting product contains not < % free amino acids and oligopeptides. predominant are aspartic acid, leucine, isoleucine, arginine and lysine, which account for > % of the free amino acids. the potential usage of the protein hydrolysate as a nutrient for microorganism cultivation is estimated. microbiological studies have demonstrated that the hydrolysate from spw can be used as a protein component in nutrient media. the tested microbial strains satisfactorily grew on the media. the z variant alpha- proteinase inhibitor (a piz) misfolds in the endoplasmic reticulum (er) and is a substrate for er-associated protein degradation (erad). we report here that a piz degradation is also dependent on vps /atg , a gene that encodes a component of two pi -kinase complexes that regulate membrane traffic; complex i is required for autophagy, complex ii is required for the cpy-to-vacuole pathway. to elucidate why vps p participates in a piz degradation, we tested the hypothesis that erad was saturated at elevated levels of a piz expression and that excess a piz was targeted to one of these alternative quality control pathways. overexpression of a piz led to vacuole-dependent degradation and both complexes were required for delivery of the excess a piz to the vacuole. when the cpy-to-vacuole pathway was compromised a piz was secreted and the distribution of soluble vs. aggregated forms of a piz was comparable with that of wild type yeast. however, disruption of autophagy led to an increase in levels of aggregated a piz; suggesting that when erad is saturated the excess a piz is selectively targeted to the vacuole via the cpy-to-vacuole sorting pathway, while excess a piz that forms aggregates in the er is targeted to the vacuole via autophagy. together, these results reveal multiple pathways for recognition and removal of aberrant proteins and provide direct evidence that aggregated a piz is removed by autophagy. our findings may have application in the understanding of, and treatment for, individuals with liver disease caused by the accumulation of er aggregates of a piz. acknowledgements: the study was supported by national science foundation grants mcb- and mcb- . yeast and lactobacillus association generates peptides from acid goat whey proteins fermentation s. didelot, s. bordenave-juchereau, e. rosenfeld, l. murillo, j. m. piot and f. sannier laboratory of biotechnology and bioorganic chemistry, university of la rochelle, la rochelle, france. e-mail: lmurillo@univ-lr.fr our goal was to produce peptides from fermentation of unsupplemented acid goat whey by dairy micro-organisms. we used a lactobacillus, lactobacillus paracasei, and a yeast, candida parapsilosis, both previously isolated from a cheese microflora. when co-cultivated aerobically, both micro-organisms grew on unsupplemented goat whey and led to a medium acidification from to ph . . reversed phase (rp)-hplc analysis revealed a total alpha-lactalbumin hydrolysis after h of fermentation, a modification of the beta-lactoglobulin elution peak, and . -fold increase in peptide level compared with the non-fermented whey. in the absence of c. parapsilosis, l. paracasei grew poorly on whey and only a weak medium acidification from to . was observed after h of fermentation. rp-hplc analysis revealed a weak modification of beta-lactoglobulin elution peak, a truncated form of alpha-lactalbumin and no peptide generation. c. parapsilosis was able to grow on unsupplemented goat whey without modifying ph of the medium, but only % of proteins were hydrolysed (alpha-lactalbumin) or denaturated (beta-lactoglobulin) and, again, no peptides were detected. these results suggest that (i) c. parapsilosis is required for l. paracasei growth and (ii) the co-culture of both micro-organisms is needed to generate peptides from alpha-lactabumin hydrolysis. during co-culture on whey, the use of penicillin g and cycloheximide as bacterial and yeast growth inhibitors respectively, revealed that l. paracasei growth was required for medium acidification to ph . and alpha-lactalbumin hydrolysis. however, we demonstrated that the protease(s) responsible of alpha-lactalbumin hydrolysis was (were) synthesized by c. parapsilosis during the first stage of fermentation and that medium acidification (obtained either by l. paracasei growth or chemically) was required for yeast protease(s) activity. dengue virus causes widespread human diseases such as dengue fever, dengue hemorrhagic fever and dengue shock syndrome. the viral genome is a positive rna strand that encodes for a single polypeptide precursor. processing of the polyprotein precursor into mature proteins is carried out by the host signal peptidase and by ns serine protease. the three dimensional structure of ns protease domain ns pro has been elucidated [ ] . recently a new construct of the recombinant form of the ns pro, was engineered [ ] . we have expressed in e. coli the his-tag-cf .gly.ns pro protein a new construct of the recombinant form of the ns pro linked to a -residue co-factor, corresponding to a part of ns b, via a non-cleavable, flexible non-apeptide (gly sergly ), and have currently optimized the purification procedure. chemically optimized substrates, peptides and depsipeptides, were designed and tested to afford an efficient in vitro activity assay, using hplc and fret spectroscopy. the data suggest that the amino-terminal region of the -amino acid co-factor domain is involved in additional charged interactions with ns that are essential for activity as previously described. this form showed catalytic activity and spectroscopic studies were performed to identify the folding of the protein. moreover, experiments of limited proteolysis have been performed to identify the essential enzymatic domain of the protein and to stabilize the role of the cofactor in the activity and in folding stabilization of the enzyme. after h of the limited proteolysis with endoproteinase asp-n the product was analyzed by sds-page and activity assay, showing a high reduction of the molecular mass and only a loss of the activity of the %. cd and n- h-hsqc spectra of this protein fragment were performed and other functional and structural characterizations are in progress in our laboratory. it is intended to obtain the structure in solution of the essential active domain of the uniformly c, n-labeled cf .gly.n-s pro by high-field d nmr spectroscopy. the solution structure of the enzyme will be used to answer yet unresolved questions about the mechanism of action, the role of its cofactor ns b, and the observed substrate specificity. introduction: fish consumption is associated to nutritional benefits due to the presence of proteins of high biological value, minerals, vitamins and polyunsaturated fatty acids. most studies concerning the benefits of fish consumption on cancer prevention have focused on fish fatty acids but little is known about the potential bioactivity of fish peptides. the present study was then designed to assess the antiproliferative activity of various fish protein hydrolysates, in order to further purify and characterize anticancer peptides. methods: twenty-one fish hydrolysates (from seven species) produced within the framework of the european valbiomar programm. fish hydrolysates composition (protein, fat and salt content) was determined by standard methods (kjehldhal, soxhlet extraction and volhard respectively). cytotoxic and antiproliferative activity were assayed in vitro on mcf- / and mda-mb- human breast adenocarcinoma cell lines, following a cell viability colorimetric assay (promega, france). antiproliferative activity of fish hydrolysates was compared with that of reference anticancer molecules with various cellular targets, namely actino-mycine d, cytosine-beta-d-arabinofuranoside, cyclophosphamide, etoposide, kenpaullone and roscovitine. results: composition analysis revealed that most hydrolysates contained more than % protein. three blue whiting hydrolysates containing % protein, . % lipid and . % salt induced a strong breast cancer cells growth inhibition when tested at g/l for h in cell culture medium. blue whiting hydrolysates , and , respectively, induced a growth inhibition of . , . and . % on mcf- / , and . , . and . % on mda-mb- . these in vitro antiproliferative activities are in the range of that observed when the two breast cancer cell lines are treated for h with kenpaullone, roscovitine or cytosine-beta-d-arabinofuranoside ) m. further studies are engaged to fractionate and characterize the antiproliferative peptides contained in blue whiting hydrolysates. during recent years, it has been established that intracellular proteolysis in eukaryotic cells is largely accomplished by a highly selective non-lysosomal pathway that requires atp and a large ( . mda) multisubunit complex known as the s proteasome. the proteasome-mediated pathway plays vital regulatory functions. it degrades many important proteins involved in cell cycle control, in signaling pathway, and in general metabolism, including transcription factors and key metabolic enzymes. another function of the proteasomal system is the removal of abnormal, misfolded and oxidized proteins generated under normal and, in particular, stress conditions. to date, proteasomes from other than animal or plant cells were studied only in yeast. recently, in our laboratory, the proteasome-mediated pathway was shown to be involved in the regulation of ligninolytic activities in the white rot fungi trametes versicolor and phlebia radiata upon nutrient starvation (staszczak, enzyme microb technol ; : - ). it was the first report on proteasomes in fungi representing basidiomycota. white rot fungi are able to degrade lignin by the action of secreted enzymes, the best characterized of which are laccases, lignin peroxidases, and manganese peroxidases. the subject of lignin biodegradation has commanded attention for a considerable period of time mainly because of its ecological significance and wide industrial applications of bioligninolytic systems. heavy metal ions are important environmental pollutants which affect biodegradation processes performed by white rot fungi. in the present study, we investigated whether the proteasomal degradation pathway might be involved in the regulation of laccase production by t. versicolor in response to cadmium exposure. studies of cacybp/sip function using small interfering rna cacybp/sip was discovered as a protein that bound calcyclin (s a ) in a calcium-dependent manner (filipek and wojda ; filipek and kuznicki, ) and its distribution and some biochemical properties have been studied. for instance, it has been shown that cacybp/sip binds calcyclin via its c-terminal fragment (nowotny et al. ) and that, beside calcyclin, it interacts with other calcium binding proteins of the s family (filipek et al. ) . originally, we identified cacybp/sip in ehrlich ascites tumour (eat) cells but it is also present in other mammalian tissues and cells. in particular, high expression of cacybp/sip was found in neuronal cells of mouse and rat brain (jastrzebska et al. ) . at present the distribution and structural properties of cacybp/sip are quite well described but its function remains obscure. there is only one paper published concerning the possible involvement of cacybp/sip in b-catenin ubiquitination and degradation (matsuzawa and reed ). to elucidate the biological role of cacybp/sip we have designed and synthesized sirna (small interfering rna) against this protein. this sirna was then used to transfect neuroblastoma nb- a and embryonic kidney hek cells, expressing high and low amount of endogenous cacybp/sip respectively. the level of cacybp/sip was monitored in cell extracts by western blot technique. we found that sirna against cacybp/sip, which we designed, inhibited the expression of this protein, as its level in transfected cells was lower in comparison with control cells. at present, we checked the effect of diminished expression of cacybp/sip on b-catenin degradation and other cellular processes. acknowledgements: this work was supported by grants: kbn heavy metals are powerful poisons for living cells. it has been shown that exposure to arsenicals, either in vitro or in vivo, in a variety of model systems, causes the induction of a number of the major stress protein families, such as the heat shock proteins (hsp) (toxicol appl pharmacol ; : ). the reasons for heavy metal toxicity in vivo are not fully understood, but they are known to contribute to the accumulation of aberrant proteins (bba, , , ). in animal cells, arsenite has been reported to cause sulfhydryl depletion, to generate reactive oxygen species and increase the level of high molecular mass ubiquitin-protein conjugates (toxicol appl pharmacol ; : ). in cells submitted to stress conditions, several components of the ubiquitin/proteasome pathway are activated. in this major, eukaryotic proteolytic pathway, multiple ubiquitin molecules are enzymatically ligated to proteins destined for catabolism by an enzyme system composed of three types of enzymes, commonly referred to as e , e , and e . the large ubiquitinprotein conjugates thus formed are subsequently degraded by a very large protease complex, the s proteasome, in an atpdependent process. the changes in free ubiquitin (ub) and ubiquitin-protein conjugates (ub-p) levels were followed by immunoblotting during the incubation of the higher plant lemna minor l.(duckweed) in the presence of arsenite (as), at concentrations known to confer thermotolerance to the plants. the observed increase in the amount of large molecular mass ubiquitin-protein conjugates is indicative of a role for the ubiquitin/ proteasome pathway in the response of lemna to as stress. this outcome is primarily attributed to an increased availability in protein substrates during as treatment for three main reasons: an increase in protein carbonyl (a major marker for protein oxidation) content detected by immunoblotting; moderate increments (as determined by semi-quantitative rt-pcr) in the mrna levels of the codifying sequences for the ubiquitin pathway components: ubiquitin, e , e and the b subunit and the atpase subunit of the s proteasome; an identical pattern of variation for the large ubiquitin-protein conjugates is observed in the simultaneous presence of as and cycloheximide, indicating that the observed increase in ubiquitin conjugates does not depend on de novo protein synthesis. ageing and autophagy y. stroikin and a. terman experimental pathology, linko¨ping university, linko¨ping, sweden. e-mail: yurst@inr.liu.se life of aerobic cells is associated with continuous oxidative damage resulting in the formation of altered, non-functional macromolecules and organelles. intracellular accumulation of oxidized proteins defective organelles and lipofuscin inclusions are typical manifestations of ageing that preferentially affects long-lived post-mitotic or growth-arrested cultured cells. autophagy, an important biological mechanism for renewal of damaged intracellular structures, has been found decreased in ageing. to learn more about the role of autophagy in ageing, we studied the effect of the inhibitor of autophagic sequestration -methyladenine ( -ma) on human diploid fibroblasts and astrocytes. inhibition of autophagy in growth-arrested (confluent) fibroblasts for weeks resulted in the accumulation of altered lysosomes displaying lipofuscin-like autofluorescence, especially when -ma exposure was combined with hyperoxia. the findings suggest that autophagy is indispensable for normal turnover of lysosomes, and lysosomal components may be direct sources of lipofuscin. the accumulation of oxidatively damaged intracellular structures (so-called biological ''garbage'') was associated with decreased cell viability. two-week-inhibition of autophagy with -ma resulted in a significantly increased proportion of dying cells when compared with both untreated confluent cultures and dividing (subconfluent) cells exposed to -ma. similar results were obtained when autophagic degradation was suppressed by the protease inhibitor leupeptin. the results support the idea that biological ''garbage'' accumulation is essential for ageing and age-related death of post-mitotic cells, which can be prevented by cell division. recently two family members of the tumour suppressor gene p have been described, p and p , which seem to be necessary for specific p -induced stress-response pathways. furthermore, p and p appears to be crucial to determine the cellular sensitivity to anticancer drugs, particularly in tumours lacking functional p . here, we show that p and p isoforms are also regulated by proteasomal degradation. we have identified several e -ubiquitin ligases responsible for the regulation of the stability of p and p . we found that the regulation of p and p is isoform-specific. furthermore, we demonstrate that ubiquitination of p influences the cellular localization of p and of the respective e -ubiquitin ligases. finally, we show that the expression of the various e -ubiquitin ligases can be differentially induced by p -isoforms. in addition, the e -ubiquitin ligases can influence the apoptotic function of p . our findings demonstrate that p and p are sent to degradation or stabilized by e -ubiquitin ligases in an isoform-specific manner and we suggest a negative feedbackloop between p , p and their regulators, as they also influence the function of p and p . increased level of metalloproteases was shown to accompany tumor angiogenesis and active invasion in adjacent tissue [ ] . development of different types of tumors is often accompanied by increased protease activity in blood [ , ] . in the present study we compared protease activity of plasma and eluate from surface of blood cells in healthy donors and patients with breast tumor. we have demonstrated recently that in blood of healthy donors almost all circulating nucleic acids (cirna) are bound at the surface of blood cells. in patients with fibroadenoma cirna were found at cell surface whereas in breast cancer, no cell-surfacebound cirna were detected in blood [ ] . conjugates of hydrophobic and hydrophilic peptides of cd receptor with biotin were incubated with avidin-coated -well eia microplates. avidinpeptide complex was incubated with samples under investigation and serial dilutions of proteinase k solution, which was used for calibration of protease activity. undegraded peptides were visualized by incubation with goat anti-peptide antibodies followed by conjugate of anti-goat immunoglobulins with peroxidase. blood plasma and eluate from surface of blood cells of cancer patients demonstrated increased level of anti-hydrophilic protease activity compared with healthy donors. increase of protease activity against hydrophilic peptide in blood correlate with decrease of cell-surface-bound cirna, indicating that blood proteases can affect concentration and distribution of circulated na. identification of cleavage site and natural substrate specificity of prta, a serralysin-type metalloprotease from the entomopathogenic microorganism photorhabdus prta, a secreted basic metalloprotease of photorhabdus, belongs to the m b (serralysin) family of proteases. the biological function of these enzymes is not known, but in some cases they are supposed to have a role in virulence. serralysins are generally assumed to have broad substrate side-chain specificity. attempts toward the generation of a sensitive and specific substrate of these enzymes had limited success, and no such substrate is available for prt-a. through mass spectrometric analysis of prta cleavage products of oxidized insulin a and b chain, we found that prta has a welldefined cleavage site preference. based on this, we developed a sensitive and highly specific oligopeptide substrate through optimization of the amino acid composition and length. the kinetic parameters of prta isolated from photorhabdus luminescens ssp. laumondii strain brecon were measured on the best substrate, dabcyl-glu-val-tyr-ala-val-glu-ser-edans, giving a km of . · ) , a kcat of . · ) /s and a kcat/km of . · . its poor hydrolysis by various proteases proved its specificity, while it was very sensitivity in measuring prta activity in hemolymph samples from photorhabdus infected galleria mellonella larvae. the substrate preference of prt-a was determined by in vivo digestion of hemolymph proteins from manduca sexta. six minor protein components were selectively cleaved, which were provisionally dis- the epithelial sodium channel (enac) is an integral component of the pathway for na + absorption in epithelial cells. enac activity is mainly regulated by mechanisms that control its expression at the cell surface, such as ubiquitination. the ubiquitin ligases nedd and nedd - have both been shown to bind to enac and decrease its activity. conversely, the serum-and glucocorticoid regulated kinase (sgk), a downstream mediator of aldosterone, is able to increase enac activity. this effect is at least partly mediated by direct interaction between sgk and nedd - . sgk binds both nedd and nedd - but it is only able to phosphorylate nedd - . phosphorylation of nedd - reduces its ability to bind to enac, and hence increases enac activity. the impact of the interaction between nedd and sgk remains unclear. nedd -like proteins interact with enac via their ww-domains. these domains bind py-motifs (ppxy) present in enac subunits. nedd and nedd - both have four highly homologous ww-domains. previous studies have shown that interaction between nedd and enac is mainly mediated by ww-domain . sgk also has a py-motif, therefore we tested whether the ww domains of nedd and nedd - mediate binding to sgk. we show that single or tandem ww domains of nedd and nedd - mediate binding to sgk and that, despite their high homology, different ww domains of nedd and nedd - are involved. our data also suggest that ww domains and of nedd - mediate the interaction with sgk in a concerted manner, and that in vitro the phosphorylation of sgk at serine residue increases its affinity for the ww domains of nedd - . the stimulatory effect of sgk on enac activity is partly mediated via nedd - and will decrease if competition between nedd and nedd - for binding to sgk occurs. we show that nedd and nedd - are located in the same subcellular compartment and that they compete for binding to sgk in vitro. the concerted or successive action of proteolytic enzymes has been described in a number of important biological processes in which proteins are degraded or matured, such as digestion, turnover (lysosomal, proteosomal...), blood coagulation, developmental remodeling or apoptosis, among others. the complementary action of proteases belonging to different families to achieve a more efficient o a better modulated hydrolytic mechanism is well documented. specific molecular associations or shared scaffolds between the involved proteases and/or protein inhibitors and defined three-dimensional structures have also been reported. however, only in a few cases such structures involved metallo.carboxy-peptidases or their inhibitors [ ] . we shall review this subject and describe, in such a context, a new model found in a marine invertebrate organism in which such a fact takes place. in particular, the characteristics of a novel bifunctional molecule displaying the functionalities and structures of serine-and metallo.carboxy-peptidases will be presented. its structure is fully different than the ones previously reported by us and collaborative groups for metallocarboxypeptidase inhibitors [ ] [ ] [ ] . regulating the activity of herpes virus proteases c. s. craik departments of pharmaceutical chemistry, pharmacology, and biochemistry and biophysics, ucsf, san francisco, ca, usa. e-mail: craik@cgl.ucsf.edu herpesviral proteases exist in a monomer-dimer equilibrium in solution. dimerization is required for activity and a comformational change communicates the oligomerization state of the enzyme to the active site of each intact monomer. each monomer has an active site, which is spatially separate from the dimer interface. kaposi's sarcoma-associated herpesvirus (kshv), encodes a protease (kshv pr), which is necessary for the viral lytic cycle. like those of other herpesvirues proteases, the dimer interface of kshv pr is composed primarily of a helix near the c terminus, of the protein. the helix of one monomer interacts with residues in the symmetrically related helix of the other monomer across the dimer interface as well as with neighboring helices. small molecule inhibitors, site directed mutagenesis and d nmr spectroscopy were used to compare the monomeric and dimeric forms of kshv pr and to investigate the relationship of the active site and the dimer interface of the enzyme. active site inhibition was shown to strongly regulate the binding affinity of the monomer-dimer equilibrium of the protease, shifting the equilibrium completely to the dimeric form of the enzyme. a previously undetermined conformational change provided insight in to the regulation of protease activity by dimerization as well as an explanation for the weak dimerization of a family of enzymes with a disparately large dimer interface compared to their measured binding affinities. using this information as a guide, protein grafting of the interfacial helix onto a small stable protein, avian pancreatic polypeptide, generated a small macromolecular inhibitor that successfully disrupted the dimer interface and inhibited enzymatic activity. these results provide direct evidence that peptide bond hydrolysis is integrally linked to the quaternary structure of the enzyme, validate the protease as a therapeutic target and suggest the dimer interface may be an alternative site for antiviral design. abteilung strukturforschung, max-planck-institut fu¨r biochemie, martinsried, germany. e-mail: huber@biochem.mpg.de proteolytic enzymes catalyze a very simple chemical reaction, the hydrolytic cleavage of a peptide bond. nevertheless they constitute a most diverse and numerous lineages of proteins. the reason lies in their role as components of many regulatory physiological cascades in all organisms. to serve this purpose and to avoid unwanted destructive action proteolytic activity must be strictly controlled. control is based on different mechanisms which i will discuss and illustrate with examples of systems and structures determined in my laboratory. the family of serine protease inhibitors known as the serpins is represented in all branches of life and predominate in the higher organisms, including man. they have evolved an extraordinary mechanism to inhibit proteases which distinguishes them from the other families of serine protease inhibitors, and renders them uniquely qualified to control of the proteolytic pathways essential to life. the mechanism is best described as a spring-loaded mousetrap, where nibbling of the peptide loop bait springs the trap and crushes the unsuspecting protease. as with a mousetrap, the active state of a serpin is metastable, and the energy released upon conversion to its more stable form is used to trap the protease. the complexity of the serpin mechanism provides many advantages over the simpler lock-and-key type mechanism, utilized by all other serine protease families. serpins provide stoichiometric, irreversible inhibition, and the dependence on serpin and protease conformational change is exploited for signaling and clearance. the potential for regulation is also an inherent part of such a complex mechanism, as illustrated by the heparin activation of serpins antithrombin and heparin cofactor ii. however, with complexity of mechanism also comes susceptibility to disease causing mutations: both through loss-of-function, as with thrombosis caused by antithrombin deficiency; and gain-of-function, as with dementia caused by neuroserpin polymerization. many crystallographic structures of serpins have been solved over the past years, and we now have a frame-by-frame cinematic view of the intricate conformational rearrangements involved in protease inhibition, modulation of specificity, and molecular pathology of the remarkable shape-shifting serpins. structural lessons of serine proteases: function and mechanism of the serine protease-like hgf as a growth factor in met signaling hepatocyte growth factor (hgf), a plasminogen-related growth factor, is the ligand for met, a receptor tyrosine kinase implicated in development, tissue regeneration and invasive tumor growth. hgf acquires signaling activity only upon proteolytic cleavage of single-chain hgf into its a/b-heterodimer, similar to zymogen activation of structurally related serine proteases. although both chains are required for activation, only the achain binds met with high affinity. recently, we reported that the protease-like hgf b-chain binds to met with low affinity this suggests that additional allosterically linked regions may be involved in the signaling process. furthermore, antibodies directed toward the b-chain or the hgf a-chain result in inhibition of met phosphorylation in a cells. these antibodies also inhibit proliferation in bxpc cells and baf cells. implications for dimerization mechanisms of hgf-dependent met receptor activation and signaling are presented. in addition, mutagenesis of the hgf b active site region has been investigated with respect to imparting enzymatic activity. thus while hgf has the function of a growth factor, the structural and receptor binding aspects of hgf are more akin to those of serine proteases. trypsinogen with a amino acid leader peptide on its n-terminus is the predominant form of the enzyme in human brain gene prss on chromosome of the human genome encodes, due to alternative splicing, both mesotrypsinogen and trypsinogen . mesotrypsinogen has long been known as a minor component of trypsinogens expressed in human pancreas, while the mrna for trypsinogen has recently been identified in brain and other human tissues. analysis of the gene encoding trypsinogen predicted two isoforms of the zymogen: isoform a may have a amino acid, while isoform b a amino acid n-terminal leader sequence. the translation initiation site for isoform a is an atg codon, while the initiation site predicted for isoform b is a ctg codon. we measured the amount of trypsinogen mrna and the quantity of the protein as well in selected areas of the human brain. trypsinogen could be localized in glial and neuronal cells using immunohistochemical methods. we purified human trypsinogen by affinity chromatography. our results show that splice isoform b is the predominant if not the exclusive form of the zymogen in human brain. the n-terminal residue of the isolated protein was identified by amino acid sequencing as a leucine. at the same time the longest mrna we were able to isolate was barely longer than the one corresponding to splice isoform b. although the most trivial explanation of our results is that isoform a is proteolytically processed to result in isoform b, it cannot be excluded that leucine rather then methionine is used as translation initiator amino acid. search for endogenous substrates for prolyl oligopeptidase in porcine brain prolyl oligopeptidase (po) is a serine protease present in most tissues, which preferentially cleaves the peptide bond at the carboxyl site of proline residues. the function of po is unknown, but it has been associated with several disorders of the central nervous system, such as depression and alzheimer disease. the purpose was to look for endogenous substrates for the recombinant porcine po in porcine brain. we adapted a method to extract the proteins from the brain with special attention to the smaller polypeptides since po is not known to cleave peptides larger than amino acids. subsequently we looked for a method to separate the protein mixture in less complex fractions. d-gelelectrophoresis, commonly used in proteomics, is only suitable for proteins with a molecular weight between and kda and an iso-electric point between and . two-dimensional chromatography offers a suitable alternative for small peptides. we chose ion exchange chromatography as a first and reversed phase high pressure liquid chromatography as a second step. the resulting fractions were divided into two parts. one part was incubated with the purified po, the other served as a control. by looking for shifts in the mass spectrum between the control sample and the incubated sample, we identified peptides cleaved by po. different methods, such as esi-qtof-ms and maldi-toftof-ms, were used to sequence cleaved peptides by msms. these experiments allowed us to deduce the sequence requirements for po cleavage. serine protease subtilisin immobilized on novel mesoporous materials serine proteinases are widely used in protein mapping and peptide or ester bond formation. fixation of enzyme on solid support has many advantages, such as high stability, possibility of recovering and low product contamination by enzyme. subtilisin carlsberg, a protease from bacillus licheniformis, was immobilized on mesoporous silica (sba- ) and several organosilica supports via physical adsorption. the bifunctional mesoporous organosilicas containing ch -ch or ch=ch bridges in combination with organic tethers bearing amino or hydroxyl functionalities were synthesized using supramolecular templating in the presence of non-ionic triblock copolymers and exhibited high surface area and large pore diameters in the range of - Å suitable for the incorporation of subtilisin. the kinetics of immobilization was examined for six different carriers. it was shown that enzyme retained hydrolytic activity after the immobilization. the dependence of subtilisin loading on the starting concentration of the enzyme during adsorption shows the maximum loading ( mg protein/g support) at [e] = mg/ml. the ph dependences of loading and activity of immobilized biocatalysts were bell-shaped. for the organosilica support containing amino and hydroxyl groups the ph-dependence was shifted to the alkaline ph by in comparison with the support containing ch -ch bridges. the adsorbed subtilisin desorbs easily in aqueous media, while no leaching of the enzyme was observed in acetonitrile and dmf/acetonitrile mixture ( / ). the immobilized biocatalyst shows high hydrolytic activity after incubation in non-aqueous acetonitrile for week and after h incubation in % dmf/acetonitrile mixture. these data indicate a possible application of the obtained biocatalysts in low water media. purification, structural and biological characterization of protease inhibitors from acacia plumose seeds protease inhibitors have been used in many current medicines. therefore, there is a considerable interest inside the pharmaceutical industry in discovering new composites and mechanisms of protease inhibition, since these investments have led, for example, to new anti-hiv therapeutical tests, coagulation diseases treatment and tests with anti-carcinogenic drugs. serine protease inhibitors are found in all plant tissues, mostly in the seeds of the leguminosae subfamilies: mimosoideae, caesalpinoideae and papilionoideae. acacia genus is one of most important member of mimosoideae, and the presence of protease inhibitors in this genus was described in only three species and none of them were structurally characterized. in this sense, we are studying three new protease inhibitors from a. plumose seeds. from saline extract of triturated mature seeds the inhibitors were purified and presented anti-coagulant activity, serine protease inhibitory activity and action on growth of fitopathogenic microorganisms, in vitro. the purification steps included size exclusion chromatography on the superdex- column, equilibrated and eluted with pbs, a ionic exchange chromatography on mono-s (hr / ) column, equilibrated with the buffer sodium acetate mm (ph . ), and eluted with the same buffer in a gradient of - . m of nacl. three fractions (eluted around . , . and . m of nacl) that presented anticoagulant activity and serine protease inhibition were separated and denoted apia, apib and apic. their apparent mws were around kda, by sds-page in the absence of reducing agents. in the presence of reducing agents they shown two bands: between - , and - kda. the n-terminal analyze of higher mw chains were tyafl (apia); kellvdne (apib) and telhdd (apic). the circular dichroism spectra of these inhibitors were very similar, presenting a maximum around nm and a minimum in nm, compatible with presence of unordered and beta elements of secondary structure. their nterminal, cd spectra and two-polypeptide chains linked by covalent bound, are compatible with kunitz type inhibitors. probably these inhibitors are three different isoforms that present different inhibition specificity degree on the serine proteases family. the ki to different serinoproteases (trypsin, plasmatic kalikrein, elastase, quimotrypsin) and specificity to the phytopatogenic fungus are being investigated. although the proteases were initially described as enzymes involved in the non-specific degradation of dietary proteins, today it is known that they can also act as highly specific enzymes that perform selective cleavage of specific substrates. thus, alterations in the structure, regulation or function of this type of enzymes underlie serious human disorders including cancer. to date, more than protease and protease homologs are annotated in man, mouse, and rat genomes (www.uniovi.es/degradome). the increasing complexity of the proteolytic systems has led to the introduction of global concepts as the term degradome to define the complete set of proteases that are produced in a specific moment by a cell, tissue or organism. as part of our studies focused on the characterization of the mammalian degradomes, we have identified and cloned unusual mosaic proteases containing in tandem serine protease domains. the first, called polyserase- is synthesized as a transmembrane protein that undergoes post-translational events to generate three independent serine protease domains. the second polyprotease is the polyserase- , a secreted protein that remains as integral part of the initial protein product. to date, it is difficult to understand the putative functional advantages derived from the complex polyproteases and, albeit extremely unusual, it is not an unprecedented situation. thus, the amphibians ovochymase and oviductin are polyserine proteases that contain three in tandem serine proteases. in humans, angiotensin-coverting enzyme and carboxypeptidase d are polymetalloproteases that exhibit some similarities to the polyserases. all these polyproteases constitute examples that illustrate an additional strategy for increasing the complexity of the degradomes. evolution of a genetic locus, expressing several protease inhibitors with homology to whey acidic protein (wap) a. clauss and Å . lundwall department of laboratory medicine, lund university, malmo¨, sweden. e-mail: adam.clauss@klkemi.mas.lu.se we have previously described a locus on human chromosome that gives rise to proteins containing wap four disulphide core (wfdc) domains. among them are the elastase inhibitors elafin and secretory leukocyte proteinase inhibitor (slpi). both slpi and elafin are also known to be important components of the innate immune defence by displaying anti-microbial properties. in order to gain a deeper understanding of the biological role of the locus, we have now extended our investigations of its organization and evolution into non-human mammals. homologous loci were identified on mouse chromosome , rat chromosome and dog chromosome . transcript sequences were generated by race technology or retrieved from the est databases. as in humans, the murine and canine loci are divided into two sub-loci separated by approximately kb. the majority of genes are conserved in all species, but the comparison also showed gain and loss of genes, e.g. two human pseudogenes were identified due to the discovery of functional rodent genes, and in the rat several duplications has yielded four slpi genes. a most interesting finding was that there is no murine elafin gene. the different wfdc domains showed a highly variable species conservation. this was particularly striking in proteins containing multiple domains, where the aminoterminal wfdc generally displayed low conservation, whereas the opposite was true for the carboxyterminal wfdc. the difference could be due to the potential targets of the inhibitors, which might be either highly variable exogenous microbial proteases or conserved endogenous proteases. signaling mechanism of thrombin-induced human gingival fibroblast contraction thrombin is activated during gingival tissue injury and inflammation. thrombin and other bacterial proteases also affect the functions of adjacent periodontal cells via stimulation of proteaseactivated receptors (pars). we noted that thrombin and par- agonist peptide ( lm) induced the gingival fibroblasts (gf)-populated collagen gel contraction within -h of exposure. however, par- and par- agonist peptide (< lm) show little effect on collagen gel contraction. u (phospholipase c inhibitor) and -apb (ip antagonist) were effective in inhibition of gf contraction. thrombin-induced gf contraction was inhibited by mm egta (an extracellular calcium chelator) and verapamil (a l-type calcium channel blocker). in addition, w ( and lm, a calcium/calmodulin inhibitor), ml- ( lm, myosin light chain kinase, mlck inhibitor), and ha ( lm, rho kinase inhibitor) completely inhibited the thrombin-induced collagen gel contraction. thrombin also induced the phosphorylation of erk /erk in gf. however, u only partially inhibited the thrombin-induced gf contraction. similarly, wortmannin ( lm), ly ( lm) (two pi k inhibitors) and genistein, also showed partial inhibition. moreover, nac was not able to suppress the gf-contraction, as supported by slightly decrease in reactive oxygen species production in gf by thrombin. these results indicate that thrombin is crucial in the periodontal inflammation and wound healing by promoting gf contraction. this event is mainly mediated via par- activation, plc activation, extracellular calcium influx via l-type calcium channel, and the calcium/calmodulin-mlck and rho kinase activation pathway. survival of the anticarcinogenic bowman-birk inhibitor from soybean at the terminal ileum of cannulated pigs plant protease inhibitors (pi) of the bowman-birk class, a major pi class in legume seeds, have emerged as highly promising cancer chemopreventive agents, being capable of preventing or suppressing carcinogenic processes in a wide variety of in vitro and in vivo animal model systems. in order to exert their chemopreventive properties in vivo, plant pi have to resist and survive, at least to some extent, degradation by acidic conditions and digestive enzymes during gut passage. in this study, we have evaluated the survival rate of the bowman-birk inhibitor (bbi) in the terminal ileum of cannulated pigs fed defatted soybean. two different quantitative approaches have been carried out. firstly, a competitive indirect elisa assay using an antisera capable to detect bbi free and/or in complex with digestive proteases; secondly, we have carried out spectrophotometric measurements of trypsin and chymotrypsin inhibitory activities in ileal samples, where the presence of bbi metabolites and/or single active loops can be detected. according to the elisa method, ileal apparent digestibility of bbi was %, which resulted in a recovery of . mg out of . mg/kg feed ingested. significantly higher ileal digestibility values ( %) were found when trypsin and chymotrypsin inhibitor activities were evaluated. the results suggest that the immunoassay may be overestimating the presence of functional pi by detection of inactive bbi, but also that the presence of complexed bbi with digestive proteases, even if protein extraction was carried out under acidic conditions, could make bbi undetectable in activity assays. studies are in progress to overcome these drawbacks. the resistance of bbi to the acidic conditions and digestive enzymes of the upper gastrointestinal tract make these proteins very interesting candidates for evaluation as chemopreventive agents, in modulating cell viability and tumor progression within the gastrointestinal tract. a single amino acid change in a chymotrypsin prevents plant proteinase inhibitor binding plants have evolved economical strategies to combat insects, which on one hand involves the production of multi-domain pis that can target multiple enzymes with different specificities and on the other, pis that belong to structurally distinct families. solanaceous plants, produce both type i and type ii families of pis, which specifically target serine peptidases. this study showed that type i pis are better inhibitors of a particular class of chymotrypsins within the gut of helicoverpa species that is otherwise unaffected by the type ii class of inhibitors. homology models were used to identify a single amino acid substitution in the helicoverpa chymotrypsin that was likely to confer resistance to the type ii inhibitor. our hypothesis was further supported by recombinant expression and mutagenesis of this single amino acid in the type ii inhibitor-resistant chymotrypsin. we therefore propose that both type i and ii inhibitors are required to protect plants against lepidopteran insects. mobility of the sulphate protamin/ low molecular weight heparin complexes in an electrical field glycosaminoglycans low molecular weight heparin (lmwh) activated plasma serine proteases inhibitors. serine proteases play an important role in thrombogenesis, the process that leads to blood clotting and such as heart attack, stroke and other cardiovascular disorders. lmwh has been used to temporarily render the blood incoagulable during prophylaxis or treatment of thrombosis and sometimes result in serious bleedings and for the heparin anticoagulant activity neutralization used sulphate protamin. it was investigated relationship between new lmwh-sk derivatives (were generated through the controlled cleavage of porcine intestinal mucosa heparin with a mixture of chitinolytic complex from streptomyces kurssanovii) anticoagulant activities and lmwh-sk complexes with sulphate protamin mobility in an electrical field. with this purpose used biospecific electrophoresis in % agarose with protamin sulphate. precipitation zones (zones of the equivalent) in the ''rocket'' form were generated. scanning image was saved as jpg format. the ''rocket'' squares estimated with the help of bandscan program. results: lmwh-sk with molecular mass (mm) . ; . ; . ; . ; . ; . kd demonstrated antithrombin activities (aiia) - iu/mg, activities against factor xa (axa) has made - iu/mg, axa/aiia ratio -( . - . ). correlation coefficients between mm and precipitation zone heights or squares consist . - . (p < . ), between axa activities and precipitation zone heights or squares consist . - . (p < . ). conclusion: lmwh-sk was obtained with the chitinolytic comlex hydrolisis help has ratio axa/aiia- , , it is necessary for antithrombotic preparations. with the mm decrease axa activity increase and precipitation zone heights or squares of the lmwh-sk complexes with sulphate protamin decrease. the role of extracellular proteases in supplying filamentous fungi with nutrient compounds is well understood and experi-mentally documented. however there is no definite answer on the question on the need and role of these proteases in pathogenesis. the study of differences in the spectra of extracellular enzymes of saprotrophic and pathogenic fungi performed on fusarium species revealed that activity of secreted serine proteinases of pathogenic f. culmorum strain was much higher (up to -fold) than that of saprotrophic strain. the use of f. culmorum strains differing in pathogenicity (strongly and weakly pathogenic) demonstrated that activity of secreted serine proteases of strongly pathogenic strain was significantly higher ( . - -fold) than that of weakly pathogenic strain. this tendency was preserved in calculations of activity towards protein content and dry weight of mycelium indicating on purposeful synthesis and secretion of extracellular proteases by strains with high pathogenicity. at that these differences were much higher when the substrate for trypsin-like proteinases bz-arg-pna was used than in the case of substrate for subtilisin-like proteinases glp-ala-ala-leu-pna. according to the data obtained it is proposed that the value activity of trypsin-like proteinases secreted by the fungi correlated with the degree of their pathogenicity and plays, apparently, an important role in pathogenesis. acknowledgment: this work was supported by grants from the russian foundation for basic research. conformational adaptation of a canonical protease inhibitor upon its binding to the target protease increases specificity atomic resolution crystal structure of sgti in complex with crayfish trypsin provided further data on the molecular basis of the inhibition mechanism of pacifastin type inhibitors. in complex with crayfish trypsin, sgti exhibits more or less continuous contacts in an extended region (through sites p -p ' ) of the molecule. the comparison of this complex with a simulated bovine trypsin-sgti one shows that more than half of the interaction energy surplus is originated from the extended region of binding. some of these contacts result from a conformational change of sgti that was induced by its binding to the enzyme which is strongly supported by the critical comparison of the crystal structure of crayfish trypsin-sgti complex with the free form of sgti. alignment of the nmr structure ensemble with the x-ray structure of complexed sgti and a careful comparison of the backbone j, w angles were carried out. additionally, noe-derived restraints and corresponding distances in the complex are also compared. local conformation of both p -p and p '-p ' regions of the inhibitor shows significant changes upon binding suggesting that either or both of these regions may act as molecular recognition sites. this comprehensive analysis of the local backbone properties of sgti in the free and in the complex form made possible to identify conformational similarities and differences responsible for its efficient binding to the enzyme, and provides a good basis for further studying the structural aspects of protease inhibitor specificity. as most of serine proteases enteropeptidase light chain contains four disulfide bonds and one nonpaired cysteine at (chymotrypsinogen-derived residue numbering) position which forms disulfide bond linking the pro-and catalytic domains. a mutant of human enteropeptidase light chain cys ser was constructed by site-directed mutagenesis. the recombinant wild type and mutant proteins were produced in escherichia coli bl (de ) with expression vector pet- a. the active proteins were obtained after solubilization and renaturation of the fusion protein thioredoxin/human enteropeptidase light chain from inclusion bodies. after autocatalytic cleavage of thioredoxin the active enzyme was purified on agarose linked soybean trypsin inhibitor. the yield of refolded active enzyme increased from . to . % in case of cys ser mutant. the wild type and c s mutant showed similar kinetic parameters for cleavage of small synthetic substrate gly-asp-asp-asp-asp-lys-naphthylamide, small ester thiobenzyl benzyloxi-carbonil-l-lysinate (z-lys-sbzl) and fusion protein cleavage. both enzymes were inhibited by trypsin-like serine proteases inhibitors but not inhibitors of chymotrypsin-like, cysteineor metallo-proteinases. recombinant human enteropeptidase light chain and its mutant c s were active between ph and with a broad optimum at about ph . and demonstrated quite high stability to different denaturating agents. both enzymes demonstrated secondary specificity to chromogenic substrate z-ala-phe-arg-na with km = . mm, kcat = s- . proteinaceous low molecular serine protease inhibitors from wood rotting fungi k. j. grzywnowicz and j. zuchowski biochemistry department, maria curie-sklodowska university, lublin, poland. e-mail: grzyw@hermes.umcs.lublin.pl proteolytic enzymes have been firmly established as main regulatory components in a number of cellular and physiological processes. the most important factors influencing the proteolytic enzymes are natural, proteinaceous protease inhibitors, which form complexes with target proteases. they have been extensively investigated from the points of view on physiological functions, as tools for protease enzymology, models for protein-protein interactions and on potential medical applications. there is growing interest in new inhibitors of proteases from various sources. among known protease inhibitors from fungi are, yeasts inhibitors of proteinases a (asparagine protease) and b (serine protease), and low molecular inhibitors of serine proteinases from fruiting bodies of mushrooms -pleurotus ostreatus and lentinus edodes as well as some undefined proteinase inhibitory activities from water extracts of some species of basidiomycetes. searching for new, bioactive metabolites of basidiomycetous fungi we isolated and characterized recently some low molecular, proteinaceous, natural inhibitors of serine proteases, from mycelia of wood rotting fungi -trametes versicolor, abortiporus biennis and schizophyllum communae. isolation of inhibitors was achieved by ion exchange and size exclusion chromatography. preliminary characterization of their inhibitory activity (against some serine proteases), ph and temperature optima of action, and molecular mass, were classically analyzed. analysis of n-terminal amino acid sequences of these inhibitors suggests a new family of serine protease inhibitors from fungi. more detailed characterization of inhibitors (including molecular modeling) and preliminary experiments with laboratory animals and with lines of human cells are in progress. the role of serine proteases in the lectin pathway of complement activation p. ga´l , g. ambrus , v. harmat , b. ve´gh , g. na´ray-szabo´ , r. b. sim and p. za´vodszky institute of enzymology, hungarian academy of sciences, budapest, hungary, protein modeling group, hungarian academy of sciences, budapest, hungary, department of biochemistry, university of oxford, oxford, uk. e-mail: gal@enzim.hu the complement system is a cascade of serine proteases, and mediates essential functions during infection as a part of the innate immunity. activation of the complement system culminates in the destruction and clearance of invading microorganisms and damaged or altered host cells. our view about the complement system has changed considerably in the recent years, due to the discovery of a new activation pathway of complement: the lectin pathway. we have recombinantly expressed and characterized the mannose-binding lectin associated serine proteases: masp- and masp- . these are related mosaic serine proteases with similar domain organization but with different enzymatic properties. we showed that masp- is capable of autoactivation and it can cleave c and c complement subcomponents. masp- , therefore, can initiate the complement cascade without the contribution of any other protease. we demonstrated that the complement control protein (ccp) modules, which associate directly with the serine protease domain, stabilize the structure of the catalytic region masp- and contain exosites for the large protein substrates. these results are in agreement with the crystal structures of activated and zymogen forms of masp- . masp- is the most abundant mbl-associated serine protease but it cannot activate the complement system. we demonstrated that masp- has a more relaxed substrate specificity compared to masp- and the activity of both proteases can be blocked by c -inhibtor. we concluded that the two mbl-associated serine proteases participate in evolutionary and functionally different pathways. comparative kinetic study on s ' trypsin variants l. gombos, j. tó th, p. medveczky, a. ma´lna´si csizmadia and l. szila´gyi laboratory of enzymology, department of biochemistry, eo¨tvo¨s lora´nd university, budapest, . e-mail: gl@ludens.elte.hu by far the most serine proteases have a glycine in position , which is part of the s ' subsite (the second subsite on the enzyme surface c-terminal from the scissile bond of the substrate). in contrast, human trypsin , the trypsin isoform expressed in human brain, possesses an arginine in that position. the bulky side chain of this amino acid is responsible for the inhibitor resistance, the most striking feature of this isoform, as it interferes with the binding of polypeptide inhibitors to the enzyme surface. a chimpanzee typsin also has an arginine , while rat trypsin v bears a tyrosine in that position. there is also a snake venom plasminogen activator, a trypsin type serine protease, that contains an s ' phenilalanine. we created glycine, arginine, tyrosine and phenilalanine s ' variants of human and rat trypsins by site directed mutagenesis in order to investigate the effect of these amino acids on the kinetic behaviour. on small chromogenic substrates and synthetic inhibitors, which do not interact with the s ' residue, there is no signifi-cant difference between the various mutants in catalytic efficiency and inhibitory constants, respectively. however, on oligopeptide substrates the catalytic efficiency decreases - -fold in the nonglycine variants. this effect is even more dramatic with polypeptide partners: the catalytic efficiency drops - times while inhibitory constants increase by - orders of magnitude. we conclude that the catalytic mechanism is not fundamentally influenced by the substitution of residue , although this amino acid is part of the oxyanion hole. bulky residues in the s ' subsite hinder mainly the binding to interaction partners. structural studies on masp- : towards the understanding of the mechanism of autoactivation mannose-binding lectin-associated serine protease (masp- ), is the key enzyme of the lectin activation pathway of complement, a major element of innate immunity. a dimer of masp- complexed with mannose-binding lectin (mbl) is able to perform its biological functions: upon recognition of the pathogen by mbl masp- undergoes autoactivation, and then initializes the complement cascade by cleaving c and c . masp- is a mosaic protein containing a chymotrypsin like serine protease domain (sp) and further domains with binding sites of mbl or substrates. our present study focuses on the structural background of the ability of the zymogen form of masp- to undergo autoactivation. we solved the structures of catalytic fragment of masp- both in its zymogen and activated forms. comparison of the two structures reveals characteristic conformational differences in the classical activation domain and in some other loops lining the substrate binding region. loop shows a unique conformation with arg blocking the s pocket. we docked the activation loop of masp- in the active site of the active enzyme and built a model of the complex of the active and zymogen forms. the model reveals extended regions of molecular recognition. while this model represents the second step of autoactivation (active form cleaves zymogen), the first step (zymogen cleaves zymogen) requires the stabilization of the zymogen enzyme in active-like conformation. we built a model of a zymogen-zymogen complex. favorable and unfavorable contacts of the two zymogen molecules help us to identify possible molecular switches, as well as contact regions stabilizing an active-like conformation of the zymogen enzyme in the complex. the deg/htra proteases are atp-independent serine endopeptidases which are present in most organisms, including bacteria, humans and plants. previous work in our laboratory has shown that the deg protease of the model plant arabidopsis thaliana selectively degrades the photodamaged d protein in the reaction center of photosystem ii (psii) in vitro. therefore, deg is thought to catalyze the primary cleavage of photodamaged d protein, which is an important step of the repair mechanism that restores functional psii. our present studies aim to elucidate the regulation of the deg protease activity, especially with regard to its d degrading activity. we found deg associated to the stromal side of the thylakoid membranes and as a soluble protein in the chloroplast stroma. the amount and distribution of deg protein remained unchanged after exposure to different light intensities, which suggest either a substrate regulation or a posttranslational regulation of the d degrading activity of deg . recent advances on deg regulation and complex formation will be presented. novel peptide inhibitors of human kallikrein (hk ) human kallikrein (hk ) is a serine protease produced by the secretory epithelial cells in the prostate. it activates several other proteases that may participate in the proteolytic cascade mediating metastasis of cancer. thus, modulation of hk activity is a potential way of preventing tumor growth and metastasis. furthermore, specific ligands for hk may be potentially useful for targeting and imaging of prostate cancer. we used enzymatically active recombinant hk captured by a monoclonal antibody exposing the active site of the enzyme to screen phage display peptide libraries. six different peptides binding to hk were identified using libraries expressing or amino acids long linear peptides. three of these peptides were specific and efficient inhibitors of the enzymatic activity of hk . alanine substitution analysis revealed that motifs of - amino acid determined the inhibitory activity of the peptides. the peptides are also of potential utility for development of immunopeptidometric assays for hk , which is promising marker for diagnosis of prostate cancer. furthermore, these peptides are potentially useful for treatment and targeting of prostate cancer. the mechanism of autoactivation of the zymogen masp- residues on the surface of pathogens. we managed to recombinantly express and purify two forms of zymogen masp- . one form is the wild type zymogen enzyme, which can be activated, while the other one is a stable zymogen mutant form of masp- . we could prepare the zymogen form of wild type masp- under certain conditions which enabled us to examine the kinetics of activation. we demonstrated that activation of masp- is a true autocatalytic activation without the involvement of any other protease. we characterized the enzymatic properties of zymogen masp- using the stable zymogen form. we demonstrated that zymogen masp- cannot cleave small synthetic substrates but it can cleave large protein substrate (c ). a molecular model for the interaction between zymogen and activated masp- during activation has also been built based on the available d structures of zymogen and activated masp- . influence of streptokinase on the fibrinolytic system proteins the present study is dedicated to the investigation of the effect of protein by bacterial origin -streptokinase (sk) on the activity and interaction regulation mechanisms of fibrinolytic system proteins. the study was carried out with use of porcine haemostasis system which plasminogen isn't activated by sk. especially we were interested in study of the changing fibrinolytic system parameters such as tissue type plasminogen activator (t-pa), plasminogen activator inhibitor (pai- ), plasminogen, a -antiplasmin activities. also the main parameters of coagulation system such as fibrinogen, soluble fibrin, fibrin degradation products levels and thrombin activity and quantity were studied. it was used affinity chromatography, electrophoresis, western-blotting, elisa, determination of proteins activity. it has been determined an increased consumption of plasminogen on % in h after streptokinase injection. it was shown that activity and concentration of t-pa were significantly increased in . times in h. on the next stages of investigation this parameters tend to norm. after sk injection pai- quantity was increased in two times ( . ng/ml compared to normal . ng/ml). the interesting fact was the activation of prothrombin by sk without activation of coagulation system in vivo. the injection of sk causes the significant increase of t-pa activity and quantity possibly due to direct or/and indirect effect on endothelial cells. we can conclude that sk causes pai- secretion due to effect on platelets as % of pai- storage is in a-granules of platelets. thus analysis of the data displayed besides of well-known sk function the influence of sk on the changing of fibrinolytic system potential possibly due to its effect on endothelial cells and platelets. paracrystalline inclusions in the mitochondrial matrix or intermembrane compartment occur in several biochemically unrelated disorders such as myopathies, paragangliomas and steatohepatitis, and in various cell types under normal conditions, as well. however, little is known about the composition of the inclusions, the mechanism of their formation and their relation to disease processes. in this study we have described the helix-shaped structures in the intracristal compartments of rat liver mitochondria that have undergone ca + -induced permeability transition. the filaments are anchored in opposing parts of the mitochondrial membranes and appear to support the cristae mechanically. a protein, that apparently is a component of these helical filaments, has been identified as serine protease lactb. this protein shows close sequence similarity to the class c bacterial beta-lactamases and is the only member of this class in animals. since lactb has not been studied previously we cloned its cdna for expression in e. coli as c-terminal his-tagged fusion protein. lactb underwent proteolytic processing in both e. coli and in isolated mitochondria resulting in several protein fragments. this is likely to be due to autocleavage and may be an activation/maturation process. d blue native gel electrophoresis indicated that lactb was part of a > kda protein supercomplex. in summary, the presence of the serine protease motive in lactb and its supposed ability to form helical filaments suggest that lactb might function not only as a component of 'mitoskeleton' in maintaining and rearranging the mitochondrial ultrastructure under certain conditions, but also might take part in apoptotic processes. novel psychrophylic trypsin-type protease from serratia proteomaculans proteinase with trypsin specificity from psychrophylic microorganism serratia proteomaculans was partly purified. it was shown that the properties of this enzyme (temperature and ph-stability, efficiency of substrate hydrolysis) correspond with the psychrophylic character. inhibitor analysis and study of substrate specificity indicate that this enzyme is serine trypsin-type protease. at the same time this enzyme is zinc-dependent. proteases of such type were unknown till now. secondary specificity of the studied enzyme differs from the bovine trypsin specificity -this protease hydrolyses the short substrates more efficient. zinc, cadmium (ii) and copper (ii) ions in mmolar concentrations inhibit the enzyme activity. the unusual character of calcium ions influence on substrate hydrolysis and inhibition by the bovine pancreatic trypsin inhibitor (bpti) was registered for the studied enzyme. in vitro by a neutral to basic ph change [ , ] . the kinetics of the activation process can be followed by stopped flow fluorescence (sff) experiments while the structural features of the transition can be explored by in silico molecular dynamics (md) and targeted molecular dynamics (tmd) [ ] simulations. to challenge the activation process, mutants were constructed and studied by sff measurements. subsequently, on these mutants multiple md/tmd simulations were carried out. our results indicate the existence of parallel activation pathways. they demonstrate the absolute necessity of multiple simulations and of proper statistics. they reveal the pros and cons of the tmd method. a simple method for the purification of a novel serine endoprotease from wheat triticum aestivum (cv. giza ) has been developed. it consists of ion-exchange and gel filtration. the molecular mass of the enzyme was kda by sds/page under reducing conditions and kda by gel filtration on a sepharose b column. the enzyme had isoelectric point and ph optimum at . and . , respectively. the substrate specificity of the enzyme was studied by the use of synthesized and natural substrates, azocasein, azoalbumin, hemoglobin, casein, gelatin and egg albumin. the enzyme appears to prefer azocasein with km mg azocasein/ml. the enzyme had a temperature optimum at °c with heat stability up to °c. while co + and mg + accelerated the enzyme activity by and %, respectively, ca + and ni + had very little effect. the enzyme was strongly inhibited by phenylmethylsulphonyl fluoride (pmsf), but not by the other protease inhibitors, suggesting that the enzyme is a serine protease. from the results it can be concluded from the characterization that the t. aestivum serine protease may be suitable for food processing. in vitro effects of a potent, selective dipeptidyl peptidase ii (dppii) inhibitor in leukocytes and u -cells. the compound was able to penetrate the cell membrane and proved efficacy without evidence for acute cellular toxicity. there was a dosedependent inhibition of intracellular dppii activity without affecting the dppiv activity (maximal efficacy at nm). these properties enable to differentiate between dppii and dppiv in biological systems and allow further investigation of the physiological function of dppii. in a second step, we have been investigating the involvement of dppii in apoptosis in human leukocytes by using this compound. preliminar results based on annexin v-/pi-staining using up to lm inhibitor in u -cells and pbmc did not show signs of apoptosis while dppii activity was inhibited for %. effect of calcium ions on hydrolysis of peptide substrates of general formula a-(asp/glu) n -lys(arg)-b, catalyzed by enteropeptidase (ec . . . ), differs depending on substrate type. for specific enteropeptidase substrates (n = ) calcium ion exhibits the promotion of hydrolysis by the natural two-chain enteropeptidase. hydrolysis of atypical enteropeptidase substrates (n = - ) is as a rule less efficient; in addition calcium ion shows in this case the inhibition influence. therefore the regulation of the nondesirable side-hydrolysis during full-length enteropeptidase-catalyzed chimeric proteins processing is possible by means of calcium ions. on the contrary the hydrolysis of substrates of all type (n = - ) by enteropeptidase light chain as well as the enzyme containing the truncated heavy chain ( - or - fragments) is inhibited by calcium ions. hydrolysis of the natural enteropeptidase substrate, trypsinogen, is at least two orders of magnitude more efficient than any artificial substrate hydrolysis. we propose that this effect is caused by participation in trypsinogen coordination with enzyme of the addition secondary substrate binding site and/or calcium-binding site; both sites located on the n-terminal half ( - ) of the enteropeptidase heavy chain. one more mechanism of the regulation of the enteropeptidase activity by calcium ion is the unusual calciumdependent autolysis of the enteropeptidase heavy chain leading to the drastic loss of its activity towards trypsinogen. autolysis of enteropeptidase heavy chain and well-known autolysis of trypsin were compared; the second one serves as the natural defense mechanism against the undesirable premature proenzymes activation in pancreas leading to pancreatitis. the corresponding enteropeptidase inactivation in low ca + ion environment might be the component of the same protective mechanism. b - p human trypsin selectively cleaves myelin basic protein: is this brain protease involved in the pathomechanism of multiple sclerosis? demyelination, the breakdown of the major membrane protein of the central nervous system, myelin is involved in many neurodegenerative diseases. proteases participating in this process are potential targets of therapy in neurodegenerative diseases. in the present in vitro study the proteolytic actions of calpain, human trypsin and human trypsin (the product of gene prss ) were compared on lipid-bound and free human myelin basic protein as substrates. digestions only with calpain and human trypsin actions may be of some physiological or pathological relevance, since these two are expressed in human brain. the fragments formed were identified by using n-terminal amino acid sequencing and mass spectrometry. the analysis of the degradation products showed that human trypsin of these three proteases cleaved myelin basic protein most specifically. it selectively cleaves the arg -thr and arg -thr peptide bonds in the lipid bound form of human myelin basic protein. based on this information we synthesized region - of myelin basic protein, peptide ivtprtpppsq that contains the specific trypsin cleavage site arg -thr . in vitro studies on the hydrolysis of this synthetic peptide by trypsin confirmed our results with intact myelin basic protein. what lends some biological interest to the above finding is that the major autoantibodies found in patients with multiple sclerosis recognize sequence - of the protein. our results suggest that human trypsin may be one of the candidate proteases involved in the pathomechanism of multiple sclerosis. enteropeptidase is a heterodimeric serine protease of the intestinal brush border that activates trypsinogen by highly specific cleavage of its activation peptide following the sequence asp-asp-asp-asp-lys. its light chain alone is sufficient for an effective cleavage of fusion proteins with trypsinogen activation peptide analog. human enzyme possesses -fold specificity coefficient compare to bovine one, and an explanation of this fact can contribute a lot to the attempts of improving or modulating enzymatic properties. highly pure and active recombinant human enteropeptidase light chain (l-hep) was obtained by renaturation from inclusion bodies expressed in escherichia coli cells and the active l-hep was purified on agarose-linked soybean trypsin inhibitor. enzymatic activity of purified l-hep was studied through the cleavage of the synthetic peptide substrates and several fusion proteins. l-hep associated with soybean trypsin inhibitor slowly and z-lys-sbzl cleavage was inhibited with ki* = . nm. comparison of l-hep and bovine enteropeptidase inhibition by bovine trypsin inhibitor aprotinin has shown almost an order difference in ki*. ph dependence of the enzyme activity was measured and ph optimum point was found to be . . enteropeptidase light chain amino acid sequence and crystal structure were analyzed for the presence of target regions for mono-and bivalent ions. unlike trypsin with predicted and experimentally proved calcium-binding sites and sodium-activated thrombin, l-hep was predicted to be deprived of any of such sites and an influence of these ions on the cleavage of different substrates was found to be confined primarily to a substrate binding. as a continuation of our efforts to fully elucidate the antisnake venom properties of mucuna pruriens and to further understand the molecular changes that occurred in mouse plasma proteome as a result of in vivo challenge test with venom and mucuna pruriens proteins (mpe), two dimensional polyacrylamide gel electrophoresis was done. plasma was pooled and gels were run in triplicate to eliminate both biological and experimental variations. analysis using imagemaster d platinum software and other statistical analysis tools showed significant differences in protein expression between all the treatments and the control group. some proteins were down regulated, some up-regulated, some completely disappeared while new protein spots were identified. the protein expression of plasma of mouse immunized with mpe for weeks before challenge with lethal dose of venom and that injected with venom alone was more complex. some venom proteins like ecarin are serine proteases that activate clotting factors like prothrombin, causing haemorrhage and disseminated intravascular coagulation, on the other hand, the protease inhibitors from mucuna pruriens must have acted to antagonize these effects by direct proteolysis (cleavage products/spots appearing in the protein map) or other immunological mechanisms. the results obtained represents the first proteomics approach in studying all the plasma proteins involved in this phenomenon. we have only concentrated on protein spots showing interesting variations with respect to control. it is also an important step in the identification of the affected proteins, the kind of modifications/molecular mechanisms involved which is likely the basis of the in vivo protection the plant extract showed against the venom. the use of enzymes at low temperatures has great potential in terms of lower energy costs, therapeutic applications and to lower microbial contamination in industrial processes. low temperature proteases (cryophilic -or psycrophilic -proteases) are of particular interest for detergents and as wound debriding agents. at present, we are studying cryophilic proteases from antarctic krill (euphausia superba), which normally lives in the sea at temperatures near °c. we have isolated several low temperature proteases by chromatography. enzyme activities and stability were characterized at low temperatures and as a function of ph to find optimum conditions for different applications. a particular enzyme, named kt , showed particularly high specific activity at °c, several times that of commercial preparations of proteases such as subtilisins. this protein showed a high degree of similarity with digestive trypsins isolated from various arthropoda species. using mrna molecules obtained from abdominal sections of e. superba and subsequently subjected to a reverse-transcription reaction, we identified, isolated and sequenced a dna molecule that codes for an inactive zymogen of the enzyme. cloning of this dna sequence in escherichia coli strains allowed the recombinant expression of the zymogen, followed by purification and activation of the zymogen, which lead to an active cryophilic trypsin. we performed a homology modeling procedure that conducted us to obtain a molecular model of the mature enzyme. the d model thus obtained was refined using energy minimization, hydrogen network optimization and residue-residue contact optimization techniques, leading to a reliable model of the enzyme. we used this model to identify many interesting and novel features of the enzyme molecule that could be related with its cryophilic character, and to propose site-directed mutagenesis strategies that could be used to improve the enzyme performance at low temperatures, its ph-activity profile, specificity, inactivation resistance and recombinant expression. in addition, the d model allowed us to design and experimentally obtain mutants that are resistant to auto-degradation and more readily activated. molecular cloning and expression of lactba mitochondrial serine protease mitochondria are thought to have originated from a symbiotic relationship between a bacterium able to perform aerobic metabolism ant the ancestor of eukaryotic cells. lactb is the only mammalian protein showing sequence similarity to bacterial serine proteases and belongs to c class b-lactamases. mouse lactb is amino acids long and compromises a predicted mitochondrial import sequence, a short putative transmembrane segment, a b-lactamase homology domain containing the serine protease motif, -sxxk-, and a c-terminal d-transpeptidase domain. the physiological role of mammalian lactb is unclear. therefore, the purpose of this research work was to clone the gene of lactb for expression of lactb in e. coli for further biochemical and cell biological study. the full length lactb gene was cloned into the entry plasmid pentr/sd/d-topo. expression clones were created performing a recombination reaction between the entry clone and four destination vectors. expression constructs resulting in n-or c-terminal gst fusion protein and in n-or c-terminal his -tag fusion protein were transformed into bl (de ) competent cells which are designed for use with bacteriophage t promoter based expression systems. when lactb was expressed as an n-terminal gst fusion protein, full-length lactb protein was recovered by glutathione-agarose affinity chromatography. expression of lactb as a c-terminal gst fusion protein or with either an n-or c-terminal his -tag resulted in proteolytic degradation of the protein and we were not able to detect full-length lactb. these results show that the n-terminal gst fragment protects lactb from proteolytic processing and that lactb can undergo autoproteolysis, which may be a part of a physiological maturation or activation process. design and synthesis of retro-binding peptides active site inhibitors of thrombin thrombin is an important pharmaceutical target for the treatment and prevention arterial and venous thrombosis. biological active peptides are recognized to have significant therapevtic potential but serious limitations especially for oral dosing. the peptide stereomers could differ when forming productive complexes with an enzyme. moreover, the replacement of l-amino acid residues forming the hydrolyzed p -p' bond by their enantiomers is known to result in either an uncleavable or a very slowly hydrolyzed analogue. this phenomenon is often used for the synthesis of the peptide's inhibitors stable to the degradation by the enzymes of organism. as the peptides containing d-amino acids, nor are subject to an enzymatic hydrolysis, the purpose of researches was synthesis of a retro -d-analogues of thrombin's substrates constructed from d-amino acids. the di-and tripeptides of the general formula x-d-arg-d-phe-ome [where x = z, tos, ac h, and z-d-arg-d-ala-(d), l-phe-ome (otbu)] were synthesized by conventional methods of peptide synthesis in solution. special features of their interaction with thrombin are investigated. their inhibitory action on reaction of splitting of fibrinogen by thrombin and on reaction of a hydrolysis by thrombin baee showed, that their inactivating action depends on the substituent on n-end of dipeptides and configuration of phenylalanine in a molecule of tripeptides. the relationship between structure and inhibitory action of the synthesized peptides is discussed. the successful application of d-amino acids for designing of biologically active peptide's analogues as a potential medicinal agent, steady to enzymatic degradation is shown. substrate specificity of mannose lectin binding associated serine proteinase n. s. quinsey and r. n. pike department of biochemistry and molecular biology, monash university, melbourne, victoria australia. e-mail: noelene.quinsey@med.monash.edu.au the innate complement system is involved with the neutralization of pathogenic microorganisms. it plays a comparative role to that of the classic immune complement cascade. in the innate complement system, the oligomers of mannose lectins are able to bind to microorganisms. these oligomers have been shown to have mannose lectin binding serine proteinase (masps) attached, which once activated lead to the activation of the c convertase complex, which finally leads to the formation of the membrane attack complex. there have been three active masps identified in the human innate immune system-masp- , masp- and masp- . there is high homology between these three serine proteases especially in the n-ter- serpins are protease inhibitors that present their reactive site loop (rsl) to target proteases, followed by drastic conformational changes that inactivate the protease. the sequence of the rsl of serpins determines the target specificity. the drosophila melanogaster gene spn encodes multiple serpin isoforms each containing an individual rsl, thus enabling the attack of different proteases. variant spn a contains a consensus recognition/cleavage sequence of furin within its rsl and is equipped with a signal peptide and an endoplasmic reticulum (er) retrieval signal (hdel). this suggested that the protein resides in the secretory pathway, like furin, a proprotein convertase that activates many cellular proteins and pathogens. our experiments demonstrate that spn a forms sds-stable complexes with human furin that is inhibited with a second order rate constant of . · /m/s. the rsl of spn a is cleaved c-terminally to arg-arg-lys-arg, in accord with the enzyme's cleavage site. furthermore, the serpin is retained in the er of transfected cos cells as shown by immunofluorescence staining. a hdel deletion mutant was detected mainly in the medium of trans-fected cos cells, demonstrating the necessity of the hdel signal for the observed cellular localization. further experiments show that furin and of drosophila melanogaster are physiological targets for spn a, since secreted forms of both enzymes form stable complexes with the serpin. together, the results demonstrate that spn a is a potent inhibitor of furin that may meet the target at its natural location. experiments with the other rsl variants show that the spn gene represents a multipurpose weapon that is directed against different families of proteases. formation of the covalent tetrahedral complex (tc) with substrate is the first step of the catalytic process in the active site of serine proteases. his (chymotrypsin numbering) plays a role of a general base catalyst, activating the ser nucleophile by abstraction of its proton. it was experimentally observed that the pka of his ne in tc formed by serine proteases with transition state analog inhibitors is about units higher than the corresponding pka in the free enzyme. this work demonstrates that the environmental change of the his in tc, induced by the substrate binding in the enzyme active site, is the dominant factor in the pka increase of his ne, and triggers the enzymatic processing of the substrate. these results are based on quantum mechanical modeling of the active site of free chymotrypsin and tc complex of chymotrypsin with trifluoromethyl ketone inhibitor in dft b lyp/ - +g** level of theory. the polar environment of the enzyme active site is accounted for explicitly in the microscopic model. the combined environmental effects of the bulk water solvation and the rest of the protein is implicitly accounted for by our scrf(vs) continuous solvation approach. the role of local polar effects, such as the oxyanion and the asp -his hydrogen bond, on the pka of his ne in tc is analyzed. genome-wide analysis of subtilase (subtilisinlike serine protease) genes in microbial genomes limited to regions surrounding the asp, his and ser catalytic residues. pattern-searching methods using hidden markov models, based on conserved sequences surrounding the catalytic residues, were used to search for subtilases encoded in > bacterial and archaeal genomes, representing species. more than subtilases were found to be encoded in genomes. subtilases are more commonly found in grampositive bacteria than in archaea or gram-negative bacteria, and it is more common to have multiple subtilase-encoding genes than a single gene. the majority of the subtilases have a predicted signal peptide for translocation across the cell membrane, and a sub-group of these secreted subtilases are predicted to have a carboxy-terminal cell-envelope anchor, mainly of the lpxtg type for covalent anchoring to peptidoglycan. the genomic context of the subtilase-encoding genes was analyzed to gain insight in putative functions for these proteolytic enzymes. by also taking into account the predicted intracellular or extracellular location of the encoded subtilases, it was possible to predict a function for many subtilases in either nutrition/growth, spore germination, surface protein processing/activation, bacteriocin/toxin processing, or sigma factor activation/regulation. the poisoning by botropics species makes a similar physiologic, one of systemic effects is the blood coagulation for several mechanisms, as direct action on fibrinogen; factor x activation or platelet activation, by toxins of venoms. in the last years were identifies in botropics venoms, serine proteases. this toxins are responsible by coagulant activity with direct action on fibrinogen. serine proteases are utility for hemostatic system studies and for therapeutics use. looking for new molecules models is very important to show the mechanism of action and search structural characteristics responsible for its activities. the present work has the objective of purification and characterization of a coagulant factor (cf) from b. pirajai venom. the purification was made using a gel filtration, hydrophobic chromatography and an affinity chromatography. the molecular filtration was made in sephadex g- with ammonium bicarbonate buffer (ambic) . m ph . , resulting four fractions (p -p ), the coagulant fraction was named p . the p fraction was submitted in phenyl sepharose chromatography using triz buffer mm ph . in a decreasing gradient of nacl ( ; ; ; ; . ; m), and to finish the chromatography it was used distilled water, resulting six subfractions (fp -fp ), the coagulant subfraction was named fp . the fp sub fraction was submitted in benzamidine sepharose chromatography and eluted in the solutions: distilled water, obtained the subfraction bfp , sodium phosphate buffer mm ph . , obtained the subfraction bfp and glycine buffer mm ph . , obtained the sub fraction bfp that is the cf. the cf displayed one band in sds-page ( %) showing a pure protein, it has kda, the minim coagulant dose is . lg and has action on fibrinogen beta chain. the genome of arabidopis thaliana encodes putative proteases from the deg/htra family. this group of atp-independent serine-proteases was well examined in other organisms, especially e. coli and humans, but only limited data is available for members from this protease family in plants. deg and deg have been shown to act as proteases in the chloroplast, but no deg/htra proteases from other compartments have been examined so far. the putative protease deg is predicted to be localized in the peroxisome. we cloned the gene encoding deg (at g ) in an overexpression vector for heterologous expression in e. coli. the tagged protein was purified by affinity chromatography and used to raise polyclonal antibodies. with these antibodies we investigated the intracellular localization of deg and the protein level under various stress conditions in order to evaluate the in planta function of this protein. the effect of site-directed mutagenesis on cold adaptation of vpr; a subtilisin-like serine proteinase from a psychrophilic vibrio-species psychrophilic enzymes have very similar d structures as their homologous enzymes from mesophilic and thermophilic organisms. main characteristics of enzymes from psychrophiles are their high catalytic efficiency (kcat/km values) and thermolability. a subtilisin-like serine proteinase from a psychrophilic vibrio-species (vpr) shows these characteristics when compared to homologous enzymes from mesophilic and thermophilic organisms. the vpr gene was cloned, sequenced and expressed in e. coli and recently the crystal structure was determined at . Å resolution [ ] . structural comparisons have been carried out which have led to hypotheses about some of the structural factors which may contribute to cold adaptation of vpr. some of these hypotheses have been examined using site-directed mutagenesis. the specific residue exchanges were selected with the objective to incorporate stabilizing interactions into the cold adapted enzyme which were deemed to be present in related thermostable homologues. these include incorporation of pro into loops, a new potential salt-bridge, as well as substitutions aimed at improving packing in the hydrophobic core and decreasing apolar exposed surface. we have also introduced ser to ala substitutions at three different locations in the cold-adapted enzyme, but these were the most frequent amino acid exchanges observed in sequence comparisons of the enzyme to those of more thermostable homologues. here we report on the catalytic and stability characteristics of the selected mutants. engineering of gfp for the screening of serine protease inhibitors site specific proteolysis has been an attractive target for the development of antiviral therapies based on selective viral inhibitors. it has been previously demonstrated that reporter proteins like beta-galactosidase could be very useful for the high-throughput screening of hiv- protease inhibitors through the display of an accessible protease target site on the enzyme surface. in this work, by using structural analysis, we have engineered the gfp protein from jellyfish aequorea victoria to accommodate in its surface the hcv virus ns a- b protease cleavage site edvvccsmsytwtg, in a manner that proper proteolysis results in a fluorescent activity decrease. the three resulting gfp constructions, carrying the protease cleavage site in positions - , - and - , were soluble expressed in escherchia coli. moreover, the hcv ns cofactor residues - fused in frame via a short linker to the amino terminus of the hcv ns protease domain (residues - ) were also expressed in e. coli and under mm iptg induction, at least % of soluble protein was recovered and further purificated by an histidin tag. the analysis of gfp proteolysis in front of hcv recombinant protease were performed either with bacteria crude extracts and purificated proteins. the results presented here indicated that proper solvent exposure of target sites on gfp carrier protein may be a critical factor for protease cleavage and for the observation of fluorescence activity variance, being an aspect of absolute relevance for further design and implementation of newer analytical tests. various kinds of stressors cause the group of metabolic changes defined as the general stress response, initiated by some intracellular signals, such as production of abnormal or denaturated proteins, enhanced generation of reactive oxygen species and others. proteolytic enzymes quickly modify proteins and as a consequence can regulate cellular metabolism. although the stress defense mechanisms have been very often described in the recent literature, in very few works were estimated stress response abilities of white-rot basidiomycetes, which produce two kinds of very important ligninolytic enzymes -laccase and peroxidases. our previous results showed that the addition of menadione to abortiporus biennis idiophasic cultures caused the significant increase of the extracellular laccase activity in comparison to the control. the aim of this study was to determine activities of serine proteinases and natural serine proteinase inhibitors in idiophasic cultures of basidiomycete a. biennis grown under menadione-mediated oxidative stress conditions. we investigated the changes of intracellular serine proteinases activities in the presence and absence of atp, using hemoglobin and fluorogenic substrates. the level of natural serine proteinase inhibitors in mycelia was also measured. a fungal inhibitor of trypsin was partially purified and used to in vitro experiments. an interesting correlations between serine proteinases, serine proteinase inhibitors and laccase activities in prooxidant treated cultures were also observed. it can suggest that the proteolytic modifications under oxidative stress conditions can act as a regulation way of laccase activity. serine proteinases, inhibitory and laccase activities were additionally analyzed by native page. calpain is a ca + -regulated cytosolic cysteine protease, functioning as a ''modulator protease'', i.e. regulating/modifying functions/activities of substrates by limited proteolysis to modulate cellular functions. human has calpain genes and potential substrates extend to various cytosolic proteins such as kinases, transcription factors, cytoskeletal and er proteins. in skeletal muscles, expression of p (also called calpain ) predominates, playing an indispensable role for muscle functions in cooperation with ubiquitously expressed conventional calpains. for, a defect of p proteolytic activity originated from gene mutations causes muscular dystrophy. p localizes in myofibrils binding to connectin/titin, a gigantic elastic muscle protein connecting the z-and m-lines of sarcomere, the repetitive unit of myofibril, with a single molecule. in mdm (muscular dystrophy with myositis) mice, connectin/titin with a small deletion caused by natural mutation of the connectin/titin gene is expressed, resulting in severe muscular dystrophy phenotypes such as body weight less than a half of that of wild type, severely affected limb muscles with impaired walking ability and only - months of life time. the deletion in the mdm allele of the connectin/titin gene overlaps one of the binding sites of p in the n -line, another electron-microscopically visible line between the z-and m-lines of sarcomere. the mdm phenotypes clearly indicate that connectin/ titin or p or both are essential for proper muscle functions. to elucidate physiological roles of connectin/titin and p , we analyzed mdm mice in relation to calpain system. as a result, mar-ps (muscle ankyrin repeat proteins) were shown to be up-regulated in mdm muscle. marps bind to the n -and z-line regions of connectin/titin and function as transcriptional regulators translocating into the nuclei. carp (cardiac ankyrin repeat protein), one of marps, binding site in the n -line region is proximate to the p binding site, thus suggesting interactions of both molecules. possible signal transduction systems to modulate muscle functions revealed by the analyses will be discussed based on the results. inhibition and activation of calpain by its disordered endogenous inhibitor, calpastatin p. tompa , z. mucsi , o. gyo¨rgy , c. sza´sz and p. friedrich institute of enzymology, biological research center, budapest, hungary, research group of peptide chemistry, university of eo¨tvo¨s lora´nd, budapest, hungary. e-mail: tompa@enzim.hu calpains are a family of intracellular calcium-activated cysteine proteinases, implicated in the regulation of key cellular processes, such as cell division and programmed cell death. their activity is under tight control by an intracellular protein inhibitor, calpastatin, an intrinsically unstructured protein that contains four equivalent inhibitory domains. each of these comprise three conserved subdomains, of which subdmomains a and c anchor the inhibitor in a calcium-dependent manner, whereas subdomain b binds at the active site and inhibits the enzyme. in this work it is shown that the consequence of this mode of binding is that isolated a and c peptides promote calcium binding to calpain and thus activate the enzyme. this activation is manifest in the sensitization to calcium ion: the calcium required for half-maximal activity is lowered from . to . lm for l-calpain and to lm for mcalpain. in the physiologically significant sub-micromolar and low micromolar calcium concentration range this sensitization leads to a more than tenfold activation, which is of potential physiological importance as isolated calpain requires high calcium concentrations never realized in vivo. here we suggest calpastatin is degraded in vivo in a way that generates the activator peptides. due to the structural disorder of calpastatin, this unprecedented mode of action raises intriguing questions with respect to the generality of this ambivalent behavior. to address this issue, we have collected extreme cases, when the same protein elicits opposing, inhibitory and activatory, responses within the same molecular setting: structural predictions show that these proteins are largely disordered. as a conclusion, the possible general implications of this finding are discussed. meprins are oligomeric, brush border membrane or secreted zinc proteases that have unique and complex structures. they are composed of multidomain, highly glycosylated evolutionarily-related a and b subunits that form disulfide-linked homo-or heterooligomeric dimers. the homooligomeric form of meprin a forms very high molecular mass multimers of - da, among the largest extracellular proteolytic complexes known. meprins cleave cytokines, growth factors, bioactive peptides and extracellular matrix proteins, important compounds in inflammatory intestinal disease and in cancer metastases. to investigate the role of meprins in intestinal immune responses, inflammation was induced in mice by oral administration of dextran sulfate sodium (dss). the results showed that wild-type mice (c bl/ · ) had a more severe reaction to dss than meprin b null mice on the same genetic background, as determined by body weight loss, intestinal bleeding and mortality. this implies that the presence of meprin b increases host damage caused by dss and that meprin b plays an active role in intestinal pathophysiology. meprins are also expressed in colon cancer cells (e.g. sw , sw , and caco- ). expression of meprin a appears to increase with increasing metastatic potential. in addition, meprin a is highly expressed in the human liver hepatoblastoma cell line hepg and abundantly secreted into culture media. examination of human tumor samples showed that meprin a is expressed in primary colon tumors and in tumors that have metastasized to the liver. this indicates that meprin a expression in gastrointestinal tumor cells contributes to the progression of the disease. biochemical pathways mediating necrotic cell death and neurodegeneration in caenorhabditis elegans n. tavernarakis, p. syntichaki, c. samara and k. troulinaki institute of molecular biology and biotechnology, foundation for research and technology, heraklion, crete greece. e-mail: tavernarakis@imbb.forth.gr necrotic cell death plays a central role in devastating human pathologies such as stroke and neurodegenerative diseases. elucidation of the molecular events that transpire during necrotic cell death in simple animal models should provide insights into the basic biology of inappropriate neuronal death, and facilitate the characterization of mechanisms underlying degeneration in numerous human disorders. various cellular insults, including hyperactivation of ion channels, expression of human beta-amyloid protein implicated in alzheimer's disease, constitutive activation of certain g proteins, hypoxia and possibly the ageing process, can trigger a degenerative, necrotic cell death in the nematode caenorhabditis elegans. we are genetically and molecularly deciphering the c. elegans necrotic death program. we have isolated mutations in several distinct genetic loci that bock degenerative cell death initiated by various genetic and environmental insults. by characterizing such suppressors, we have discovered that neuronal degeneration inflicted by various genetic lesions in c. elegans, requires the activity of specific calcium-regulated calpain proteases and acidic ph-dependent aspartyl proteases. although, it is believed that these proteases become activated under conditions that inflict necrotic cell death, the factors that govern the erroneous activation of such-otherwise benign-enzymes are largely unknown. we identified novel factors that modulate cellular ph homeostasis, which are required for necrosis and showed that targeting these factors effectively protects from necrotic cell death in c. elegans. our findings demonstrate that two distinct classes of proteases are involved in necrotic cell death and suggest that perturbation of intracellular calcium levels may initiate neuronal degeneration by compromising ph homeostasis and deregulating proteolysis. search for regulatory proteins which are controlled by proteolysis in escherichia coli based on microarray analysis j. m. heuveling ag hengge, institute of microbiology, fu berlin, berlin, germany. e-mail: joheuvel@zedat.fu-berlin.de the impact of controlled proteolysis on regulatory events in prokaryotes is increasingly recognized over the last decade. as in eukaryotic cells, proteolysis is more than just a garbage disposal but has been found to be implicated in the regulation of many vital functions of the bacterial cells, like cell cycle, stress responses and development (hengge r and bukau b. mol microbiol ) . conditional degradation of regulators shows a high potential of integrating a great variety of signals as is well studied for the degradation of sigma s. this sigma subunit of the rna polymerase, which triggers the general stress response in escherichia coli is digested rapidly by the clpxp protease in association with the phosphorylated response regulator rssb under non-stress conditions (stuedemann a. embo ). several other regulatory proteins have been found to be subjected to proteolysis, as lexa, a regulator of the sos response. also lon protease is involved for example in the degradation of rcsa, a regulator of the capsule biosynthesis and of sula, a cell division inhibitor (as a review: hengge-aronis r, jenal u. curr opin microbiol ). in order to find other regulatory processes in which proteolysis plays a role we pursued a global approach using the microarray technique. in mutants lacking functional clpp or lon proteases or either one of the clp recognition factors clpa and clpx, we searched for genes, which are differentially transcribed compared to the wildtype. we found some interesting groups of genes belonging to common regulons governed by known regulators -candidates for clp or lon mediated proteolysis. after confirmation of these results through lacz fusion studies of representative genes of these regulons, these regulators are presently examined in in vivo degradation studies using immunodetection methods. a distinct group of serine peptidases cannot hydrolyze proteins, but can readily cleave peptides that are up to about amino acid residues long. the representative member of the family, prolyl oligopeptidase is implicated in a variety of disorders of the central nervous system. the enzyme consists of a peptidase domain with an a/b-hydrolase fold and its catalytic triad is covered by the central tunnel of a seven-bladed b-propeller. this domain makes the enzyme an oligopeptidase by excluding large structured peptides from the active site. in most propeller domains the circular structure is ''velcroed'' together in a mixed blade, where both amino and carboxy terminus are involved to form a four stranded antiparallel b-sheet. non-velcroed or ''open topology'' propellers are rare, and prolyl oligopeptidase was the first protein structure exhibiting a domain of this nature. the apparently rigid crystal structure does not explain how the substrate can approach the catalytic groups. two possibilities of substrate access were investigated: either blades and of the propeller domain move apart or the peptidase and/or propeller domains move to create an entry site at the domain interface. engineering disulfide bridges to the expected oscillating structures prevented such movements, which destroyed the catalytic activity and precluded substrate binding. this indicated that concerted movements of the propeller and the peptidase domains are essential for the enzyme action. biochemical characterization of thermoplasma volcanium recombinant s proteasome and its regulatory subunit g. baydar and s. kocabiyik molecular genetics, biological sciences, middle east technical university, ankara, turkey. e-mail: gozde_baydar@hotmail.com proteasome associated energy dependent proteolysis is not only involved in rapid turnover of specific proteins that could be important during periods of stress, but also engaged in the turnover of the short-lived proteins that regulate a variety of cellular processes in both procaryotic and eucaryotic cell. the universal distribution of proteasome homologs in archaeal genome provide insight into the vital role of archaeal proteasomes. s catalytic core of archaeal proteasomes in combination with various aaa atpases and membrane associated lon proteases may play role in stress response or turnover of the regulatory proteins. however, little is known about the potential physiological roles of archaeal proteasomes. this study presents the data on biochemical and biophysical features of recombinant s proteasome of a thermoacidophilic archaeon thermoplasma volcanium (tpv). pcr was performed to amplify dna fragments containing tpv genes encoding the a -and b-subunits of the proteasome from tpv genomic dna. the amplified a-gene (tpva) and b-gene (tpvb) together with their upstream sequences were separately cloned and then combined in puc vector. the resulting recombinant puc-skba plasmid was used for heterologous production of in vivo assembled s proteasome in e. coli. the recombinant proteasome was purified by combination of ammonium sulfate precipitation, gel filtration chromatography (sepharyl s- ) and ion-exchange chromatography (q sepharose). molecular masses of purified protein subunits were estimated as . kda (b-subunit) and . kda (a-subunit). substantial post-glutamyl peptide hydrolyzing activity and chymotrysin-like activity were detected as associated with recombinant proteasome. maximum chymotrypsin-like activity was measured at °c and ph . . crystallographic studies of the gtp-dependent transcriptional regulator cody from bacillus subtilis. cody is a gtp dependent transcriptional regulator of early stationary phase and sporulation genes in bacillus subtilis. it is activated by gtp, during rapid cell growth it represses several genes whose products allow adaptation to nutrient depletion. when the cells pass from rapid growth to stationary phase, the intracellular concentration of gtp drops thus releasing the repressed genes. cod y is a -residue polypeptide containing a helix-turn-helix motif for binding to dna. it also has motifs common with small gtpases, but cody has a much lower affinity for gtp. crystals of the full-length cody have been grown in the presence and absence of gtp from sodium citrate buffered solutions using lithium sulphate as a precipitant and diffraction data have been collected to . Å resolution. attempts to solve the structure using anomalous data from the semet derivative crystals of cody have been hampered by the large number ( ) of methionines in the asymmetric unit and difficulties in reproducibility of angiotensin-converting enzyme (ace) is a zinc metallopeptidase critical for the generation of the vasoconstrictor peptide angiotensin ii. a homologue of ace, ace- , has recently been identified, which appears to play a counter-regulatory role to ace by inactivating angiotensin ii. like ace, ace- is a type i membrane protein with its active site contained within the extracellular domain. the expression of ace protein is normally low and restricted primarily to endothelial cells of the heart and kidney, kidney epithelium and testis. recent evidence from ourselves and others indicates that ace is significantly upregulated in a number of pathologies, such as myocardial infarction, renal disease and hepatitis c-induced cirrhosis. given that ace can be proteolytically released from the cell surface in culture, ace may likewise be shed into plasma or urine. detection of elevated levels of ace in plasma and urine may be a useful biomarker for the diagnosis of hepatic, renal and vascular disease. using a specific quenched fluorescent substrate, we have detected ace activity in human urine. in contrast, ace activity could not be detected in human plasma; interestingly, however, we noted that plasma markedly inhibited the activity of recombinant ace , thus compromising the possibility of measuring plasma enzyme activity. we are in the process of purifying this inhibitor, which preliminary results suggest is small and hydrophilic. we are also currently optimizing methods for its removal from plasma samples, thus allowing detection of low levels of soluble ace activity in normal human plasma. the identification of a potential endogenous inhibitor of ace , the first for this family of metallopeptidases, could have significant consequences for ace function in vivo and the regulation of angiotensin peptides. future studies will examine whether plasma or urinary levels of ace are elevated in cardiovascular, renal or liver disease. molecular determinants of proteolytic processing of non-structural polyprotein of semliki forest virus. institute of molecular and cell biology, university of tartu, tartu, estonia. e-mail: lulla@ut.ee semliki forest virus (sfv) is a positive-stranded rna virus. the replication of sfv is performed by the rna-dependent rna replicase complex (rc) and regulated by proteolytic processing. during the course of the infection template preference of rc changes from rna plus-strand to minus-strand. it has been known for several years that this preference switch is due to the proteolytic processing of sfv non-structural polyprotein p , mediated by viral cysteine protease located in the carboxy-terminal domain of the nsp protein. tight temporal regulation of this template specificity switch is crucial for the viral replication, but, nevertheless, its mechanism remains unsolved. therefore, the mapping of the essential molecular determinants of the site-specific cleavage consensuses may provide necessary information, concerning the cleavage regulation as well as regulation of the rna replication. the results of our studies indicate that as little as amino acid residues from the c terminus of nsp protein determine the specificity of the proteolytic cleavage of the nsp /nsp junction. at the same time sequences laying downstream of the cleavage point (in nsp region) have only minor effect on the cleavage efficiency. the exact region required for the cleavage of nsp /nsp junction is yet not known but the sequences, required from c-terminal part of nsp protein, are likely short as well. in contrast, sequence lying within - n-terminal amino acid residues of nsp is vital for cleavage of the nsp /nsp junction. this region may represent the cofactor of the nsp protease that activates processing at the nsp /nsp cleavage site. thus, as the result of current research, a principally new function -regulation of the proteolytic processing and rna replication -was mapped to the conserved n-terminal region of the nsp . this finding significantly improves our understanding about the role of nsp , which was enigmatic till now, in the virus life cycle. activation occurs within the specific dibasic motif hsiirrsl, suggesting the involvement of the proprotein convertases (pcs) in these process. this family of endoproteases are responsible for the activation of a large variety of regulatory proteins by cleavage at multi-basic recognition sites exhibiting the general motif (k/r)-(x)n-(k/r)(n = , , or ). cotransfection of the furindeficient colon carcinoma cell line lovo with provegf-c and different pc members revealed that furin, pc and pc are vegf-c convertases. the processing of provegf-c is blocked by the inhibitory prosegments of furin, pc and pace , as well as by furin-motif variants of alpha -macroglobulin and alpha antitrypsin. accordingly, mutation of the vegf-c pc-site (hsiirrsl to hsiisssl) inhibited provegf-c processing. following zebrafish caudal fin amputation, the injection of control vector or vector containing wild vegf-c did not affect fin regeneration. in contrast, injection of muted vegf-c (pro-vegf-c) inhibited fin regeneration. these data highlight the importance of vegf-c processing in zebrafish fin regeneration and suggest that zebrafish can be used as a simple and useful model for studying the role of protein maturation by the pcs in physiological processes. thimet oligopeptidase (top) hydrolyzes a variety of bioactive peptides and is implicated in the regulation of neurological and other physiological processes. top is composed of two ''clamshell'' domains, with the substrate-binding pocket and catalytic site lying between these domains. it is speculated that conformational changes in loops and coil regions connecting the domains lead to changes in substrate specificity. the loop region (residues - ) is close enough to the active site to interact with even the smallest substrate. it contains three glycine residues and is expected to be quite flexible. in an effort to trap intermediate conformations of the loop, we have replaced gly , , or with ala and have compared the activities of the three resulting protein constructs towards two quenched fluorescent substrates. all three enzymes had lower activity than wild type towards a bradykinin analog, with g a, the most active of the mutants, possessing / wild-type activity. however, utilizing a smaller substrate, g a was the most active, surpassing even wild type (fivefold increase in activity). g a had little activity towards either substrate. these results are consistent with data that revealed increases in activity towards the larger substrate, when the enzyme is partially denatured and presumably, more flexible and with increased accessibility of the binding loop to proteolytic enzymes, when partially denatured. acknowledgment: this work was supported by hhmi, nih-ns (mjg). the proteasome: paradigm of a self-compartmentalizing protease self-processing of subunits of the proteasome crystal structures of the rhodococcus proteasome with and without its pro-peptides: implications for the role of the pro-peptide in proteasome assembly handbook of metalloproteins bovine chymotrypsinogen-a x-ray crystal-structure analysis and refinement of a new crystal form at . a resolution equilibrium and rate constants for the interconversion of two conformations of a-chymotrypsin. the existence of a catalytically inactive conformation at neutral ph refolding transition of alpha-chymotrypsin -ph and salt dependence e-mail: saleh @hotmail.com reference . arnorsdottir j, kristjansson mm, ficner r. crystal structure of a subtilisin-like serine proteinase from a psychrotrophic vibrio species reveals structural aspects of cold adaptation (which is an excellent substrate for adam- ), we observed % increase in enzymatic activity in the media of mm -ht-treated cells compared to untreated cells. however, we did not see any increase in the fluorescence when we used ''cate '', an adam substrate, which does not recognize adam- . to further support a role for adam- /tace, we designed silencing rnas against the enzyme, which were introduced into the mesangial cells using lentiviral infection. successful silencing was confirmed by western blotting days after infection. control and tace silenced human mesangial cells were stimulated with - lm of serotonin for min, and erk activation was assessed by western blotting baron-ruppert and e. heymann department of physiological chemistry (fb /gw) e-mail: rebecca.lew@med.monash.edu.au b - p functional properties of p /calpain and connectin/titin in mdm mouse skeletal muscle y bunkyo-ku the lectin pathway of complement system is an important component of the innate immunity. it provides the first line of defence against infection, since it is activated on the surface of invading pathogens. the activation of the complement system results in the destruction and clearance of foreign microorganisms. mannose-binding lectin-associated serine protease- (masp- ) is the enzyme which is responsible for the initiation of the lectin pathway of complement activation. masp- is a multidomain serine protease, which is synthesized as an inactive zymogen and become activated upon mbl binds to carbohydrate plant proteinase inhibitors are widely spread in the different plant species being a significant component of a defense system. somewhere a significant diversity of the proteins related to the same structural family of the inhibitors in the same species may be observed. the family of potato kunitz-type proteinase inhibitors (pkpis) exemplifies a group of proteins with the diverse properties and may be divided into three major homology groups: a, b and c. a lot of genes encoding different pkpiproteins of each group were found in various potato cultivars (solanum tuberosum l.). inhibition activity of plant invertase, cysteine and serine proteinase was found in proteins subgroup c. a set of gene copies were isolated by pcr from potato cv. istrinskii genome. dna sequencing analysis of these resulted in identification of different dna sequences with a high similarity to potato kunitz-type inhibitors of group c (pkpi-c). cluster analysis demonstrated that this clones represented multiple copies of six new genes denoted as pkpi-c , -c , -c , -c , -c and c . it can be supposed that at least two alleles containing pkpi-c genes are harbored in tetraploid genome of potato. one of new genes, namely pkpi-c , exhibited % identity with known invertase inhibitor cdna ( ) from cv. provita. another pkpi-c gene was similar ( % identical residues) with cdna (p ) from potato cv. bintje encoding for a putative trypsine inhibitor. four other new genes demonstrated as much as - % identity with known pkpi-c proteins from other potato cultivars. the n-terminal sequence of the protein encoded by the pkpi-c gene was identical to the n-terminal sequence of specific subtilisin inhibitor pksi isolated from cv. istrinskii.b - p regional distribution of human trypsinogen in human brain determined at mrna and protein level j. to´th , l. gombos , e. siklo´di , p. ne´meth , m. palkovits , l. szila´gyi and l. gra´f laboratory of enzymology, department of biochemistry, eo¨tvo¨s lora´nd university, budapest, hungary, institute of immunology and biotechnology, university of pe´cs, pe´cs, hungary, laboratory of neuromorphology, department of anatomy, semmelweis university, budapest, hungary. e-mail: july@ludens.elte.huproteases play an important role in many physiological and pathological processes in the central nervous system such as development, neurite outgrowth, neuronal plasticity and degeneration and cell signaling. a gene coding for such an enzyme might be prss on chromosome of the human genome. it encodes due to alternative splicing both mesotrypsinogen, which is expressed in pancreas, and trypsinogen whose mrna has been identified in different human tissues (initially in brain, recently in different epithelial cell lines from prostate, colon and airway). analysis of the gene prss predicted two isoforms of the zymogen: isoform a may have a amino acid, while isoform b a amino acid n-terminal leader sequence. in order to gain information on the possible role of human trypsinogen we have determined its amount at the mrna and the protein level as well in selected brain areas using real-time quantitative pcr and elisa.the highest transcript levels could be detected in cerebellar cortex, while low amounts were found, e.g. in cerebellar white matter samples. the distribution of the mrna in different brain areas measured by real-time pcr is consistent with the protein levels detected with elisa. the usage of different monoclonal antibodies specific for the amino acid leader sequence and the protease domain allowed the separate detection of the zymogen and the active enzyme. in e.g. the hypothalamus the zymogen is the dominant form, while a significant degree of activation was found in the cerebellar cortex. our data indicate that the extent of activation varies with different areas. as human trypsinogen is ubiquitous in the brain we conclude that it might play a role in general neurological processes. serine proteases are enzyme involved in the maintenance of the cell homeostasis. thus, this type of enzymes must be extremely regulated and it has been highly reported that serine proteases are involved in the growth and expansion of different cancers. in this regard, the type ii transmembrane serine proteases (ttsps) constitute a subfamily of membrane anchored serine proteases that are ideally positioned to carry out different interactions with other cell surface or extracellular proteins. among them, tmprss and tmprss proteins have been reported to be overexpressed in most prostate and ovarian cancers respectively, matriptase/mt-sp is expressed in a wide variety of benign and malignant tumors and hepsin is overexpressed in ovarian and renal cancers. desc- is a ttsp member found differentially expressed in squamous cell carcinoma (differentially expressed in squamous cell carcinoma gene ) and differentially from other ttsps, its expression is found to be reduced in tumor tissues respecting to the normal tissue at rna level in head and neck squamous cell carcinoma (hnscc), what suggests a possible tumor protective function for desc- . in order to shed light about the role of desc- in these processes, we have carried out the molecular cloning of the human full-length cdna and expression of the recombinant protein to delineate the implication of this protease in hnscc.diffracting crystals. therefore we used limited proteolysis and mass-spectrometry analysis to identify the domain boundaries of the protein and were able to determine the sequence of two principal proteolytic fragments corresponding to the n-and c-terminal domains of cody. these individual domains which were successfully cloned in escherichia coli, overexpressed as histagged proteins, isolated and purified. both domains have been crystallized. the crystals of the n-terminal domain grow from bis-tris buffered solutions at ph . containing polyethylene glycol and calcium acetate. the crystals of the c-terminal domain were obtained using ammonium sulphate as a precipitant. the crystals of n-terminus domain diffract to at least . Å and crystals of c-terminus domain -to . Å using an in-house diffractometer with a mar research image-plate as a detector. progress towards the determination of cody structure will be presented. the gram-positive bacterium listeria monocytogenes is a facultative intracellular parasite. interactions of l. monocytogenes with the host cell are provided by a number of secreted and cell surface proteins. one of the most important virulence factors, actinpolymerizing protein acta, is surface attached via the hydrophobic c-tailed membrane anchor. despite, the membrane anchor acta was found in comparable amounts both on the cell surface and in the culture supernatant. the aim of the work was to investigate the mechanism of acta release and the role of this process in l. monocytogenes virulence. maldi-tof ms analysis of trypsin released acta suggested releasing due to proteolytic cleavage between histidine and threonine residues in the close vicinity of the membrane anchor predicted by the htmm analysis. the substitution of histidine with proline prevented acta release into the culture supernatant, although did not disturb its surface presentation. in silico analysis of eight other l. monocytogenes membrane-anchored surface proteins suggested the role for asparagine and threonine residues in specific proteolysis. the prediction was experimentally tested by substitution of the residues with alanine. the l. monocytogenes spontaneous mutant strain, unable to release membrane-anchored proteins into the culture supernatant, was isolated. the mutation was mapped outside the acta gene and presumably affected the corresponding peptidase. the mutation impaired the invasion of l. monocytogenes into the human epithelial-like hela cells that suggested the effect of the released proteins on signaling events that result in induced phagocytosis of the pathogen by normally non-phagocytic cells. angiotensin ii (ang ii) has been proposed to act as a regulatory peptide in the epidermal layer of human skin. while the expression of receptors and peptide precursors have been demonstrated in epidermis, the formation of ang ii and its inactivation have not been studied in detail. thus we have established a model system with cultured keratinocytes to examine the metabolism of ang i, ii and related peptides by intact epidermal cells. cultures were incubated with peptides in a minimal medium, which sustained cell viability for at least h and the metabolism of peptides was monitored by chromatography (rp-hplc). with ang i as peptide substrate five major products were detected in keratinocyte culture media after h incubation. a half-life of about h was estimated for ang i and the slow degradation supports results of earlier studies revealing low activities of exopeptidases in a microsomal fraction from keratinocytes as compared to fibroblasts. the degradation of ang i was not affected by inhibitors of alanyl aminopeptidase, peptidyl dipeptidase a and neprilysin. since a peptide product formed from ang i in keratinocyte cultures resembled ang ii in hplc analysis, the activity of peptidyl dipeptidase a in these cells was assayed with hip-his-leu and the presence of the peptidase was confirmed by its sensitivity to captopril. further experiments showed that ang ii, iii and related peptides were degraded in keratinocyte cultures with rates similar to ang i and these reactions interfered severely with the formation of ang ii. immunohistochemical studies showed a strong positive staining for neprilysin and alanyl aminopeptidase in the dermal layer of human skin and at the epidermal-dermal junction confirming the results obtained with the cell cultures. soluble angiotensin converting enzyme- present in human plasma and urine p /calpain is the skeletal-muscle-specific calpain and is considered to be a modulator protease in various cellular processes. a defect in the p gene causes limb-girdle muscular dystrophy type a (lgmd a), suggesting that p functions are indispensable for proper muscle functions. in sarcomeres, p localizes at z-, n -and m line regions. although the binding partner for p at z-line has not been identified yet, n -and m-line localization of p are considered dependent on its interaction with the n a and m-line regions of connectin/titin, respectively. connectin is a gigantic sarcomeric protein playing an important role as a molecular template for sarcomeric organization, an elastic element generating passive tension, a platform for various protein ligands, etc. in this study, we focused on the molecular components associated with the n a region of connectin/titin to extend our understanding on p . intriguingly, a recessive mutation in the mouse connectin gene, mdm (muscular dystrophy with myositis), causes muscular dystrophy. there are two remarkable phenotypes consequential to mdm mutation. first, the mdm mutation abolishes p binding activity of connectin n a fragment. second, in skeletal muscle from mice homozygous for mdm mutation, upregulation of cardiac ankyrin repeat protein (carp) is observed. carp also binds to n a connectin at the n-terminal proximity of the region mutated by mdm. the effect of mdm mutation on p activity and the properties of n a connectin as well as carp were analyzed using both animal model and cell culture systems. semliki forest virus (sfv) is well known model virus, which has been studied for decades. the main topic of this research was characterization and analysis of sfv replication machinery using approach based on use conditional-lethal mutants of viruses. the direct aim of the present study was to sequence and functionally characterize a panel of independent sfv temperature sensitive mutants. from all putative ts-mutations, identified in this study, two were mapped to nsp protein, four were mapped to nsp protein and one was founded in nsp region. number of assays were used to verify phenotypic effects of revealed mutations: titration of virus stocks at different temperatures, leak yield experiments, analysis of viral rna synthesis and viral polyprotein processing at different temperatures. nsp mutants had clear viral protease defect and accumulated non-cleaved polyproteins on different stages. besides all, biotechnological branch of our research is already developing. it includes improving of existing sfv based expression vector system by use of ts-mutations for the temperature regulation of foreign gene expression in mammalian cells. dipeptidyl peptidase iv activity and/or structure homologues (dash) in brain tumors pathogenesis of many diseases, including cancer, often involves improper proteolytic post-translational modification of biologically active peptides. association of dysregulated expression pattern of novel group of ''dipeptidyl peptidase (dpp)-iv activity and/or structure homologues'' (dash) with cancer development and progression has been suggested by several authors, including us [ ] . dpp-iv enzymatic action as a common attribute of most of dash members modifies signaling potential of their substrates, biologically active peptides, not only quantitatively, but due to the changes in their receptor preferences also qualitatively. in this study, we have investigated expression (by real time rt-pcr and immunohistochemistry) and enzymatic activity (by biochemical assays and enzyme histochemistry) of plasma membrane localized dash members, in particular dpp-iv, fibroblast activation protein-alpha (fap) and attractin in human gliomas. it was revealed that varying quantities of dpp-iv, fap and attractin mrnas and proteins were coexpressed in the studied tumors. the majority of dpp-iv-like activity in the glioma tissue could be attributed to the canonical dpp-iv. this activity, assayed biochemically and expressed per mg of protein, was increased in high grade gliomas. inhibition studies suggested lack of enzymatically active attractin in the examined glioma tissues. the results of our pilot study demonstrate for the first time that both enzymatically active and inactive dash molecules are coexpressed in gliomas and suggest prevailing association of increased dpp-iv activity with high grade tumors. acknowledgment: this work was supported by iga nr/ - and msmt vegf-c is involved in the neovascularization processes, steps essential for wound healing, cancer progression and many other physiological functions. zebrafish vegf-c processing and key: cord- -pygykil authors: rahaman, jordon; siltberg-liberles, jessica title: avoiding regions symptomatic of conformational and functional flexibility to identify antiviral targets in current and future coronaviruses date: - - journal: genome biol evol doi: . /gbe/evw sha: doc_id: cord_uid: pygykil within the last years, two related coronaviruses (severe acute respiratory syndrome [sars]-cov and middle east respiratory syndrome [mers]-cov) expanded their host range to include humans, with increased virulence in their new host. coronaviruses were recently found to have little intrinsic disorder compared with many other virus families. because intrinsically disordered regions have been proposed to be important for rewiring interactions between virus and host, we investigated the conservation of intrinsic disorder and secondary structure in coronaviruses in an evolutionary context. we found that regions of intrinsic disorder are rarely conserved among different coronavirus protein families, with the primary exception of the nucleocapsid. also, secondary structure predictions are only conserved across – % of sites for most protein families, with the implication that – % of sites do not have conserved secondary structure prediction. furthermore, nonconserved structure sites are significantly less constrained in sequence divergence than either sites conserved in the secondary structure or sites conserved in loop. avoiding regions symptomatic of conformational flexibility such as disordered sites and sites with nonconserved secondary structure to identify potential broad-specificity antiviral targets, only one sequence motif (five residues or longer) remains from the > , starting sites across all coronaviruses in this study. the identified sequence motif is found within the nonstructural protein (nsp) and constitutes an antiviral target potentially effective against the present day and future coronaviruses. on shorter evolutionary timescales, the sars and mers clades have more sequence motifs fulfilling the criteria applied. interestingly, many motifs map to nsp making this a prime target for coronavirus antivirals. severe acute respiratory syndrome (sars)-cov and middle east respiratory syndrome (mers)-cov are two closely related zoonotic coronaviruses. both have successfully crossed the species barrier to allow animal-to-human transmission, and further to allow human-to-human transmission (song et al. ; reusken et al. ) . the sars outbreak in had a mortality rate of % (anderson et al. ) , and sars-cov was considered the most aggressive coronavirus compared to other human coronaviruses that commonly cause mild to moderate infection in their hosts (van der hoek ) . mers-cov is the cause of an ongoing outbreak of the respiratory illness mers (de groot et al. ) . at the time of writing, mers cases have been confirmed with a mortality rate of approximately % (world health organization ). both mers and sars have higher mortality rates in elderly and immunosuppressed populations (gralinski and baric ) . the host changes by mers-cov and sars-cov suggest that other coronaviruses can potentially cross the species barrier, become zoonotic, and enable human-to-human transmission, ultimately causing high morbidity and mortality. sars-cov and mers-cov exploited mechanistically different approaches to overcome the human species barrier, but these two viruses have a lot in common (lu et al. ) . here, we aim to identify the vulnerable regions in the proteomes of coronaviruses that neither sars-cov nor mers-cov nor their contemporary and forthcoming relatives can proliferate without, and address how to mobilize a defense against the present and future coronaviruses by targeting these regions. sars-cov and mers-cov are positive (+)-strand rna viruses encoding approximately protein products. the mers-cov proteome is primarily composed of two polyproteins, orf a and orf ab; the latter is generated by a - ribosomal slippage frameshift. these proteins are cleaved into nonstructural proteins (nsps). nsps - are products of both polyproteins, whereas nsps - are only yielded by orf ab. nsp is unique to orf a ( van boheemen et al. ) . structural proteins envelope (e), spike (s), membrane (m), and nucleocapsid (n) are elements of the physical structure that encloses the viral genome and come from distinct reading frames, unlike orf a and orf ab, which come from overlapping reading frames. additionally, the structural proteins are the product of subgenomic mrnas that are joined during discontinuous negative rna strand synthesis (van boheemen et al. ) . finally, ns protein (ns ), ns a protein (ns a), ns b protein (ns b), ns protein (ns ), and orf b protein encompass the remainder of the proteome and also arise from distinct reading frames (van boheemen et al. ) . our approach utilizes genomic sequence data, which is readily available for viruses known to cause disease. however, because most viruses pose no major threat to their host, they pass by unnoticed leaving the majority of virus genome space uncharted. with the availability of costefficient genome sequencing technology, and recent developments in the field of viral metagenomics, large-scale identification of viral genome space is on the rise (rosario and breitbart ; mokili et al. ) . by exploring viral diversity, critical components constituting a viral genus' fitness can be evaluated. examples such as the common influenza virus illustrate the rapidity of viral gene mutation and in order to maintain immune protection, an annual flu vaccination is recommended. underway efforts aim to generate broadly neutralizing vaccines whose design accounts for the genomic sequences of multiple types of influenza virus to eliminate frequent re-vaccination against the flu ross , ) . development of broadly neutralizing vaccines often relies on the consensus or ancestral sequences of extant viral sequences in order to provide greater coverage for related viruses (kesturu et al. ) . unfortunately, consensus sequences can be misleading, and ancestral sequence reconstruction is error-prone for quickly diverging sequences (mccloskey et al. ). in addition, viruses with compact genomes often express proteins with structural disorder that may undergo structural transformations. although these transformer proteins, like vp in ebola, are masters at changing their structure, and thus expanding their functional repertoire as needed for the life cycle of the virus (bornholdt et al. ) , flexible regions are potentially important in rewiring protein-protein interactions between the virus and its host (le breton et al. ; ortiz et al. ; gitlin et al. ) . the flexibility trait of many viral proteins is a complicating factor in vaccine development. for instance, dengue virus exhibits serotype-specific antibody affinity that causes antibodydependent enhancement, an obstacle in the development of dengue vaccines that protects against all four serotypes (flipse and smit ) . to overcome the hurdle posed by structural flexibility, we propose an additional screening step in identifying potential vaccine or antiviral targets that considers the structural flexibility of the viral proteins. the structural genomics initiatives increased their success rate by excluding proteins predicted to be structurally disordered (slabinski et al. ) . a similar approach can perhaps benefit vaccine development. furthermore, to make this approach robust to potential mutations, minimizing loss in efficacy or resistance, the evolutionary context of sequence and structure must be considered. thus, we suggest expanding the concept of broadly neutralizing vaccines/antivirals by increasing the diversity of viruses considered if possible. sites conserved for sequence, structure, and with low disorder propensity among diverse virus protein homologs are very likely to be constrained from ) changing sequence on evolutionary time scales and ) undergoing real-time structural transitions. these sites have potential as targets for broad-specificity antivirals or vaccines because conservation makes them broad-specificity and low dynamics avoids targeting a conformational ensemble, which is not only difficult (yu et al. ) , but that may change as the sequence diverges (siltberg-liberles et al. ) . a recent large-scale study of structural disorder in > , viral genomes in viral families found the amount of disorder in different virus families varying from . % to . % (pushker et al. ) . it was reported that coronaviridae has very low disorder content (mean disorder . %) (pushker et al. ) . coronaviridae contains two subfamilies: coronavirinae and torovirinae. sars-cov and mers-cov are part the coronavirinae subfamily, from here on referred to as coronavirus (cov). the lack of disorder is intriguing because it may be important for rewiring interactions between viral proteins and host proteins (ortiz et al. ) and providing opportunities to acquire novel functional sequence motifs (gitlin et al. ) . structural disorder has also been proposed to be important for viral viability, enabling multifunctionality and vigor in response to changes in the environment (xue et al. ) . given the low fraction of structural disorder reported across coronaviridae, we set out to investigate the conservation of structural disorder and secondary structure across cov. sites identified as conserved for structure and lacking disorder can be considered to be vulnerable and druggable in the proteomes of coronaviruses. the structural divergence capacity of these regions is limited, leaving a wider range of the present and emergent coronaviruses susceptible to the effects of potential broadly neutralizing anti-cov therapies targeting these sites. we will refer to these sites as target sites. protein sequences were identified by individual blast searches with mers-cov (taxonomy id: ) proteins orf ab (yp_ . ; polyprotein), s protein (yp_ . ), m protein (yp_ . ), e protein (yp_ . ), and n protein (yp_ . ) against coronaviruses. blast searches of the orf ab protein were performed, using start and end positions as detailed in the orf ab ncbi reference sequence file, against the refseq_protein database. the sequences retrieved from the blast output maintained the following cutoff: > % sequence identity and > % coverage relative to mers-cov sequence query. the % sequence identity and % query coverage cutoff strikes a balance between alignment quality and at least sequences for most protein families. nsp (yp_ . ; - ), nsp (yp_ . ; - ), ns (yp_ . ), ns a (yp_ . ), ns b (yp_ . ), ns (yp_ . ), orf b protein (yp_ . ), and nsp (yp_ . ; - ) are not included in this study due to < blast hits. multiple sequence alignments were constructed for the selected blast hits using mafft (katoh et al. ) . phylogenetic trees were constructed using mrbayes . . with a four category gamma distribution and the mixed model for amino acid substitution (huelsenbeck and ronquist ; ronquist and huelsenbeck ) . each tree ran for five million generations, with a sample frequency of . the final tree was constructed from the last % of samples, discarding the first % of samples as the default burnin, and using the half-compatible parameter, to avoid weakly supported nodes (i.e., with a posterior probability < . ). all trees were midpoint rooted. for every protein family, the amino acid substitution rate per site in its multiple sequence alignment was calculated using empirical bayesian estimation as implemented in rate site (mayrose et al. ) . substitution rates were calculated using gamma categories, the jtt substitution matrix (jones et al. ) , and the reconstructed phylogenies. the rates were normalized per protein family with an average across all sites equal to zero and sd equal to . this means that sites with a rate < are evolving slower than average, whereas sites with a rate > are evolving faster than average. intrinsic disorder propensity was inferred using two different predictors: iupred (default settings; "long" option) (dosztá nyi et al. a (dosztá nyi et al. , b and disopred (ward et al. ) for all proteins. for iupred, the site-specific continuous disorder propensities for each protein were mapped onto their corresponding position in the multiple sequence alignment as raw disorder propensities and as binary states, order or disorder, using two cutoffs of . and . . disorder propensities below the cutoff were assigned order and disorder propensities at the cutoff or above were assigned disorder. for the disopred predictions that were inferred using the nr database, the continuous disorder propensities for every site in a protein were mapped onto their corresponding position in the multiple sequence alignment as raw disorder propensities and as binary states, order or disorder, using a cutoff of . consequently, for every protein family (a multiple sequence alignment and its corresponding phylogenetic tree), two continuous matrices and three binary matrices resulted: iupred . , iupred . , and disopred . an additional matrix was generated to indicate sites where the binary order and disorder assignments differ between iupred . and disopred . a similar methodology was employed to analyze secondary structure predicted by psipred (mcguffin et al. ) and jpred (drozdetskiy et al. ) . for both predictors, the uniref database was used and sites were classified as loops, alpha helices, or beta strands and mapped back onto their corresponding sites in the multiple sequence alignment. this resulted in two three-state matrices for each protein family alignment, one for each predictor, and two binary matrices displaying secondary structure elements (alpha helix and beta strand) or loops. an additional matrix was generated to indicate sites where the secondary structure assignments differ between psipred and jpred. for every protein family, the binary matrices resulting from the different disorder predictions and from the different secondary structure predictions were analyzed in the corresponding evolutionary context using gloome. gloome (gain-loss mapping engine) analyzes binary presence and absence patterns in a phylogenetic context . in this study, the rate site option in gloome was used to analyze the binary matrices (iupred . , iupred . , disopred , psipred, and jpred) with the corresponding phylogenetic trees to map change of state across sites in each individual protein phylogeny . gloome was run with gamma categories and a substitution matrix set to equal rates within each state and transitions between states treated equally. from the binary disorder and order matrices, transition rates between disorder and order or vice versa (dot) were estimated. from the binary structure and loop matrices, transition rates between structure and loop or vice versa (slt) were estimated. similar to rate site, the rates were normalized per protein family with an average across all sites equal to zero and sd equal to . this means that sites with a rate < are evolving slower than average, while sites with a rate > are evolving faster than average. protein families were visualized in an integrative manner with a phylogenetic tree, any matrix (multiple sequence alignment or predictor based) displayed as a heatmap, and site-specific sequence transition rates using python packages ete (huerta-cepas et al. ) and matplotlib (hunter ) . amino acid evolutionary rates (seq) for all sites across all alignments were aggregated and binned into four possible categories characterized by the distribution of psipred predicted secondary structure at each site. sites predicted to have a loop across all sequences are "conserved loops; c(l)" and sites predicted to have a helix across all sequences or a strand across all sequences are "conserved helix-strand; c(hs)" (table ) . sites predicted to have all three states (helix, strand, and loop) or any combination of loop and one other state are "non-conserved helix, loop, strand; nc(hls)" and sites predicted to have a mixture of helix and strand are "nonconserved helix-strand; nc(hs)" (table ) . in all cases, gaps were ignored when classifying combinations of secondary structure at a site or if secondary structure conservation exists at a particular site. phylogenies were built for all protein products encoded in the mers-cov single-stranded rna genome, except for nsp , nsp , ns , ns a, ns b, orf b protein, and nsp , all of which had insufficient sequence data (< sequence hits with blast). nsp is often used as a measure for newly identified coronaviruses. according to the international committee of taxonomy of viruses, a major criterion in determining if a coronavirus is considered novel is pairwise sequence identity below % for nsp in all comparisons to previously known coronaviruses (bermingham et al. ) . four main clades, alphacoronavirus, betacoronavirus, gammacoronavirus, and deltacoronavirus ( fig. ) , are identified in agreement with the taxonomic classifications described by the ictv (international committee on taxonomy of viruses ). coronaviruses not listed by the ictv are assumed to be a part of the clade in which representatives with known classifications are situated in our nsp phylogeny. the mers clade and sars clade are sister clades in the nsp phylogeny. the hku clade and equ clade are also sister clades. together these four clades form the betacoronavirus clade, in accordance with the ictv classification (international committee on taxonomy of viruses ). betacoronavirus is represented in all phylogenies although the order of the individual subclades varies. alphacoronavirus is often found as the sister clade or outgroup to betacoronavirus. deltacoronavirus or gammacoronavirus are the most distantly related to the betacoronavirus. in the nucleocapsid phylogeny, gammacoronavirus is the first outgroup clade to betacoronavirus, and alphacoronavirus is the most distant outgroup. most nsp trees exhibit some unresolved nodes at junctures immediately preceding terminal nodes. as an effect of the % majority rule, most of the resolved nodes are well supported with posterior probability > . for % and > . for % (supplementary fig. s , supplementary material online). most trees follow the nsp topology for the main clades, with minor clade rearrangements. it should be noted that for nsp , the entire alphacoronavirus clade is placed within the betacoronavirus clade, as a sister clade to the mers clade (supplementary fig. s , supplementary material online). this may be due to increased sequence divergence rates or due to recombination. recombination events are rather frequent in coronaviruses (su et al. ) , and the mers clade potentially underwent multiple recombination events as part of the host change (zhang et al. ) . the phylogenies for membrane protein, spike protein, nsp , and nsp -nsp demonstrate (with the given blast cutoffs) recoverable protein homologs such that all coronaviruses are represented (i.e., all coronaviruses represented in the nsp phylogeny). nucleocapsid, nsp , and nsp have recoverable homologs in all clades except deltacoronavirus. nsp and nsp homologs are too divergent in deltacoronavirus and/or gammacoronavirus relative to mers-cov. envelope appears specific to betacoronavirus ( fig. ) , but it is a short protein that has been found to diverge rapidly and is likely present outside betacoronavirus (fehr and perlman ) . because different protein families yield slightly different phylogenies, for the remaining evolutionary analyses, every protein family was analyzed in the context of its own phylogeny. for all protein families, structural disorder propensities were predicted using iupred (dosztá nyi et al. a (dosztá nyi et al. , b and disopred (ward et al. ) . to verify the robustness of the binary iupred and disopred predictions, the binary assignments were compared on a site-by-site basis (table ) . when converted to binary (i.e., two states per site disordered or ordered) iupred . and iupred . are in good agreement with the larger differences seen for nsp , nsp , and nucleocapsid ( . %, . %, and . %, respectively) (table ). comparing iupred . or iupred . to disopred , large differences are in particular seen for nucleocapsid ( . % and . % respectively) and nsp ( . % and . %, respectively) (table ) . for nucleocapsid, regions that are found to be disordered by iupred . are found to be ordered by iupred . and disopred ( fig. and supplementary fig. s , supplementary material online). for nsp , regions that are only slightly disordered in a few sequences according to iupred . and iupred . , disopred predicts disorder to be conserved for all sequences (fig. ) . to quantify the fraction of disordered sites per protein family, we report the iupred . results only for simplicity (table ). in general, iupred . predicts more disorder than disopred , but several protein families have almost no disordered sites. nsp and nsp - have some variation in disorder content for different viruses. based on the fraction of disorder, nucleocapsid is the only highly disordered protein among the covs in this study, even if nsps - have outliers that are > % disordered. to compare the disorder-to-order transition rates (dot) for all protein families where the binary matrices of disorder and order include both states, the quadrant count ratio (qcr) was estimated as a measure of association in assigning slower than average vs. faster than average transition rates. for iupred . vs. iupred . , for iupred . vs disopred , and for iupred . vs. disopred , the qcrs (table ) . for nucleocapsid and nsp , the positive associations are weaker, suggesting that many sites have iupred disorder propensity in the . to . range and large differences between iupred and disopred , in accordance with the large disagreement between the binary assignment of these predictors (tables and ). for all protein families, secondary structure elements were predicted using psipred (mcguffin et al. ) and jpred (drozdetskiy et al. ) . for most protein families, the disagreement between secondary structure predictors is greater than for the disorder predictors (table ). in fact, of the protein families compared disagree at more than % of alignment sites, and two of these disagree at more than % of sites. to compare the binary structureto-loop transitions (slt), qcr was estimated as a measure of association for slt based on the different predictors. in general, there is a moderate positive association between slt for psipred vs. slt for jpred that is weaker than for the different dot comparisons (table ) . it should be noted that slt does not differentiate between alpha helix and beta strand, but considers both as "structure." this is a correct assumption if protein structure is conserved and consistently predicted, but for some protein families that is not the case. four protein families (nsp , nsp , nsp , and spike) have more than % of their sites found within the nc(hls) category with non-conserved helix, strand, and loop (two or three states present at the same site) (table ). for nsp , jpred predicts % of all sites to be a mixture of helix, strand, and loop, or any combination of loop and one other structural element ( fig. ) . envelope and nsp have % and % of their respective sites in the nc(hs) category. considering only the psipred predictions, the nc(hs) category has sites across all protein families. that is one-tenth the size of the next smallest set which is c(hs) with sites. next, c(l) has sites, and the largest category is nc(hls) with sites. comparing the evolutionary sequence rates for the sites in the different categories, based on psipred predictions only, reveals that sites in the c(hs) category are evolving at a slower rate than all other categories. nc(hs) is only just significantly different (p = . eÀ ) from c(hs), and is not significantly different from nc(hls) and c(l) (p = . eÀ and p = . eÀ , respectively). however, nc(hls) and c(l) are significantly different from each other, and both are significantly different from c(hs) (p = . eÀ and p = . eÀ , respectively) ( fig. ). for regions with five or more consecutive sites that were % conserved in sequence across ) all cov or ) across the mers and sars clades, the information of structural disorder prediction from iupred and disopred was used to identify all ungapped sites that were consistently predicted to have % conserved order. next, the information of secondary structure prediction from psipred and jpred was used to narrow down this list further by only including sites that are not changing their predicted secondary structure state for both predictors. applying the aforementioned filters to the initial , sites resulted in one ( ) region of five residues or more conserved across all cov within the n-terminal domain of nsp : dnqdl (table ) . interestingly, this region is in the vicinity of sites found important for nucleotidylating activity across the order nidovirales (lehmann et al. ) . considering only the sequences in the sars and mers clades, sequence regions of five residues or more were found in seven protein families (table ) . for nsp , nsp , and nsp , experimentally determined structures show that most regions are surface accessible ( fig. ) . some of the identified target sites are known for their functional importance. for instance, c in the middle of gscgs in nsp is part of the catalytic dyad in the nsp protease (yang et al. ) . for nsp and nsp , which have the majority of all sites, no structures are available. the sites adjacent to dnqdl are also conserved in the sars and mers clades, and five additional target sites, conserved for the sars and mers clades, are found in the c-terminal direction relative to the dnqdl motif (table ) . continuing into the rna-dependent rna polymerase domain (rdrp) in nsp , four additional regions of target sites are found, and the last three regions are found in the c-terminal part. importantly, in rdrp and in the c-terminal part are sites that are also conserved across all covs in this study. nsp has four regions of target sites distributed across the protein. . -the evolutionary context of intrinsic disorder in nsp . the phylogenetic tree was built using the multiple sequence alignments for nsp . (a) the multiple sequence alignment is colored by amino acid according to scale, arranged based on top-idp disorder promoting propensity of the amino acids (campen et al. ) , and gray denotes gaps. (b) iupred disorder propensity per site in the multiple sequence alignment. blue-to-white-to-red shows disorder propensity according to the scale for iupred . . (c) iupred disorder propensity per site in the multiple sequence alignment. blue-to-white-to-red shows disorder propensity according to the scale for iupred . . (d) disopred disorder propensity per site in the multiple sequence alignment. blue-to-white-tored shows disorder propensity according to the scale. above the multiple sequence alignment, the normalized evolutionary rates per site for amino acid substitution (seq) and the dot for the binary transformations of b-d are shown. heat maps visualized with the python packages ete (huerta-cepas et al. ) and matplotlib (hunter ) . see supplementary figures s and s , supplementary material online for additional graphics for every protein family. we have analyzed the protein evolution of the genetic components that make up the mers-cov proteome. as previously established, mers-cov has the same genomic makeup as hku -cov and hku -cov in the mers clade (woo et al. ) . some protein products are only found in the mers clade, and these were excluded from this study due to insufficient data. furthermore, for other protein products, some clades may not be represented in our protein families if their proteins were too divergent. this was an important factor in determining the applied blast hit cutoffs, as relaxing cutoffs produced alignments with more gaps and increasing stringency reduced the representative pool. because alignment quality is important due to the sensitivity of both rate site and for phylogenetic reconstruction, the chosen cutoffs are suitable. we note some clade-specific differences in recoverable homologs between different cov, but many components are shared among them ( fig. ). viral proteins often possess multifunctionality, mediated by a conformational change in response to environment-specific factors (xue et al. ) . although conformational flexibility is important for function, it also offers flexibility in what sequence motifs are on display. if these sequences are rapidly diverging, different sequence motifs will be displayed, reinforcing the notion that flexible regions are potentially important in rewiring protein-protein interactions between virus and host (gitlin et al. ) . although most cov proteins have almost no intrinsic disorder, several cov protein families have homologous sites that display loop in some sequences, helix in others and strands in some (table , supplementary fig. s , supplementary material online). these sites are not necessarily disordered but they may be conformationally flexible in realtime (with secondary structure transitions in the same sequence, making them difficult to predict) or on evolutionary time-scales (so that different secondary structure elements actually are present in different sequences). the c(hs) and c(l) sites make up approximately - % of most multiple sequence alignments. with the common expectation that protein structure is more conserved than sequence these numbers are surprisingly low. neither psipred nor jpred consistently predicts the same state for - % of all sites in these multiple sequence alignments. the accuracy of psipred and jpred's secondary structure predictions are about % (bryson et al. ; drozdetskiy et al. ) . psipred has been found to rarely predict an alpha helix instead of a beta strand and vice versa, and most of the psipred errors are due to secondary structure not being predicted (li et al. ) . when secondary structure is not conserved for the same site in a multiple sequence alignment, it suggests that the secondary structure prediction may be ) inaccurate, ) not predicted with high confidence, or ) the regions are indeed metamorphic; they can transition from one element to another. although ( ) is difficult to address without experimentally determined structures for all sequences, ( ) and ( ) are not necessarily incompatible interpretations because low confidence secondary structure prediction could indicate metamorphic secondary structure regions. metamorphic secondary structure regions have interesting consequences for conformational and functional flexibility. it should be noted that, despite the low amount of disordered sites in most cov proteins, several regions are not conserved in disorder propensity across all sequences, but sometimes the different predictors disagree as in the case of nsp . clade-specific disordered regions resulting from indel events suggest that they are not essential to the critical functions of the protein, but could cause gain-and-loss of interactions with its hosts. however, when disorder propensity is only mildly fading for a region that is present across the protein family, it may be important for the fundamental function of the protein. the virus structural proteins that interact to form the virion commonly include an envelope protein, a membrane protein, and a capsid protein that together form the machinery that encases, transports, and releases the virus. the interactions between the structural proteins are often regulated by conformational changes like vp in ebola (bornholdt et al. ) and envelope protein from dengue virus (zheng et al. ). conformational changes to another. it should also be noted that two mers clade specific inserts around position and toward the c-terminal are consistently predicted to be highly disordered. with inserts and changing structural dynamics between clades or viruses, the questions become ) which sequence motif are displayed and ) to what extent are these sequence motifs displayed? furthermore, based on the inconsistent prediction of secondary structure elements, the possibility that covs are more conformationally flexible than their intrinsic disorder content implies is noteworthy. altogether, this suggests that various mechanisms for rewiring conformational and functional space are operating in the coronaviruses studied here. if regions symptomatic of conformational and functional flexibility can be avoided in order to identify broad-specificity antiviral targets with potential to be effective against coronaviruses of today and in the future, coronaviruses as a group may become more attractive drug targets for the pharmaceutical industry in the event an additional coronavirus changes host to include humans or increase its virulence. supplementary tables s and s and figures s -s are available at genome biology and evolution online (http://www. gbe. oxfordjournals.org/). update on sars research and other possibly zoonotic coronaviruses the protein data bank severe respiratory illness caused by a novel coronavirus structural rearrangement of ebola virus vp begets multiple functions in the virus life cycle flavivirus ns and ns proteins interaction network: a high-throughput yeast two-hybrid screen protein structure prediction servers at university college london top-idp-scale: a new amino acid scale measuring propensity for intrinsic disorder gloome: gain loss mapping engine inference and characterization of horizontally transferred gene families using stochastic mapping middle east respiratory syndrome coronavirus (mers-cov): announcement of the coronavirus study group iupred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content the pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins jpred : a protein secondary structure prediction server coronaviruses: an overview of their replication and pathogenesis the complexity of a dengue vaccine: a review of the human antibody response target sites shown in d context. (a) nsp dimer, based on pdb id uk (yang et al. ). (b) nsp , based on pdb id f (unpublished). (c) nsp , based on pdb id c t a computationally optimized broadly reactive antigen (cobra) based h n vlp vaccine elicits broadly reactive antibodies in mice and ferrets computationally optimized antigens to overcome influenza viral diversity rapid evolution of virus sequences in intrinsically disordered protein regions molecular pathology of emerging coronavirus infections mrbayes: bayesian inference of phylogenetic trees ete : reconstruction, analysis, and visualization of phylogenomic data virus taxonomy: classification and nomenclature of viruses: ninth report of the international committee on taxonomy of viruses the rapid generation of mutation data matrices from protein sequences mafft: a novel method for rapid multiple sequence alignment based on fast fourier transform minimization of genetic distances by the consensus, ancestral, and center-of-tree (cot) sequences for hiv- variants within an infected individual and the design of reagents to test immune reactivity discovery of an essential nucleotidylating activity associated with a newly delineated conserved domain in the rna polymerase-containing protein of all nidoviruses bayesian model of protein primary sequence for secondary structure prediction bat-to-human: spike features determining 'host jump' of coronaviruses sars-cov, mers-cov, and beyond structural basis and functional analysis of the sars coronavirus nsp -nsp complex comparison of site-specific rate-inference methods for protein sequences: empirical bayesian methods are superior an evaluation of phylogenetic methods for reconstructing transmitted hiv variants using longitudinal clonal hiv sequence data the psipred protein structure prediction server metagenomics and future perspectives in virus discovery rapid evolutionary dynamics of structural disorder as a potential driving force for biological divergence in flaviviruses marked variability in the extent of protein disorder within and between viral families cross host transmission in the emergence of mers coronavirus mrbayes : bayesian phylogenetic inference under mixed models exploring the viral world through metagenomics the evolution of protein structures and structural ensembles under functional constraint the challenge of protein structure determination-lessons from structural genomics cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human epidemiology, genetic recombination, and pathogenesis of coronaviruses genomic characterization of a newly discovered coronavirus associated with acute respiratory distress syndrome in humans the disopred server for the prediction of protein disorder genetic relatedness of the novel human group c betacoronavirus to tylonycteris bat coronavirus hku and pipistrellus bat coronavirus hku world health organization. . who j middle east respiratory syndrome coronavirus structural disorder in viral proteins the crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor structure-based inhibitor design for the intrinsically disordered protein c-myc evolutionary dynamics of mers-cov: potential recombination, positive selection and transmission a toggle switch controls the low ph-triggered rearrangement and maturation of the dengue virus envelope proteins we thank joseph ahrens, janelle nunez-castilla, and helena gomes dos santos for assistance in the lab and for helpful discussions. the authors would also like to acknowledge the instructional & research computing center (ircc) at florida international university for providing hpc computing resources that have contributed to the research results reported within this article, web: http://ircc.fiu.edu. key: cord- - s ok uw authors: nan title: abstracts of the th annual symposium of the protein society date: - - journal: protein science doi: . /pro. sha: doc_id: cord_uid: s ok uw nan c-terminus glutamine-rich sequence deleted) to elucidate the role of metalloprotein hpn-like by fluorescence resonance energy transfer (fret) (figure ) [ ] . we found the selective coordination of ni(ii) and zn(ii) to the purified sensors and in e. coli cells. surprisingly, specific interaction between the fret sensors and bi(iii) was observed. our fret analysis confirmed the role of hpnl for ni(ii) storage and revealed the potential association of hpnl with bi-based antiulcer drugs in cells. pb- rna fate is controlled by highly-regulated rna binding proteins molecular mechanism that links the mrna-degradation pathway with extracellular signaling networks through the reversible unfolding of a rna binding domain (rbd). rna binding is also controlled by ph conditions. this finding becomes relevant for rbps such as t-cell intracellular antigen (tia- ), which shuttles between two cellular compartments (nucleus and cytoplasm) with slightly different ph values. in fact, rna binding by tia- is modulated by slight environmental ph changes due to the protonation/deprotonation of tia- histidine residues [ , ] . the ph dependence of the tia- /rna interaction provides a new insight into the function of tia- in recognizing new rna targets [ ] , like the ' terminal oligopyrimidine tracts ( tops) of translationally-repressed mrnas. along with tia- , the rbp hu antigen r (hur) is involved in the assembly/disassembly of cytoplasmic stress granules (sg), which arise as a protective mechanism by preventing mrna decay under stress situations. despite wide acceptance that rbps harboring aggregation-promoting prion related domains (prds), such as tia- , stimulate rapid self-association and formation of sgs, we propose that scaffolding sgs may be driven by rbds, since prd-lacking rbps, like hur, often form oligomers [ , , ] and are included in sgs. under continuous stress, the transition from the physiological to pathological aggregation of rbps in sgs may depend on post-translational modifications of rbds. rna-binding proteinopathies, characterized by the nucleation of irreversible sgs, are often found in neurodegenerative diseases. altogether, resulting insights into rna biology suggest that highly-regulated rbps determine mrna fate from synthesis to decay. a threat to million people in underdeveloped nations around the world, african trypanosomiasis (sleeping sickness) is a neglected tropical disease (ntd) caused by the protozoan parasite trypanosoma brucei (t. brucei). t. brucei is transmitted to humans via the tsetse fly, and replicates in the blood before crossing into the brain, causing death for the infected individual. current treatments that are available for african sleeping sickness are highly toxic and usually difficult to administer past the blood-brain barrier. it is our belief that coupling less toxic compounds with efficient drug delivery systems will contribute to the development of the most effective drug against african sleeping sickness. our goal was to determine a novel and effective chemical inhibitor with the potential to prevent the replication of t. brucei in the human body. the enzyme target for inhibition studied in this research was -phosphogluconate dehydrogenase ( pgdh), a cytosolic enzyme in the pentose phosphate pathway (ppp) of t. brucei. pgdh is essential in the ppp due to its ability to oxidize -phosphogluconate into ribulose- -phosphate, which is essential for the formation of nucleotides. primer overlap extension polymerase chain reaction (pcr) was used to synthesize the coding dna sequence of the pgdh gene, which was then cloned into a pnic-bsa inducible expression plasmid with an n-terminal histidine tag, by way of ligation independent cloning. the protein was then expressed in bl (de ) escherichia coli (e. coli) cells and purified via nickel column affinity and size exclusion fast protein liquid chromatography (fplc) to perform inhibition assays. through virtual screening, various ligands obtained from the chembridge library and nih clinical collection) were docked into the active site of the crystal structure of tb pgdh (pubchem identification pgj) using gold molecular docking software. the top scoring compounds were selected by utilizing parameters such as hydrophobic interactions, hydrogen bonds, and van der waals forces. the compounds with the best scores that also satisfied lipinski's rule of criteria for druggability were then tested in spectrophotometric enzyme inhibition assays monitoring the absorbance of nadph at nm. compounds that show inhibitory activity in the assays will be taken to higher levels of testing to determine their effect on t. brucei in other organisms. nmr studies of the structural influence of phosphopantetheinylation in nonribosomal peptide synthetase carrier proteins and impact on binding affinities andrew goodrich , dominique frueh nonribosomal peptide synthetases (nrpss) are modular enzymatic systems responsible for the production of complex secondary metabolites in bacteria and fungi. each module is comprised of (at least) three core domains whose combined action leads to the selection, activation, and incorporation of a single small molecule into a growing peptide. central to each module is the carrier protein (cp), which is first primed via attachment of a '-phosphopantetheine moiety (ppant arm) to a conserved serine to generate the active holo form. an adenylation (a) domain then covalently attaches an amino or aryl abstract acid onto the ppant arm via formation of a thioester. the cp then shuttles activated monomers and growing peptides between the active sites of catalytic domains in both the same and adjacent modules. during cp priming and peptide elongation, a cp thus exists in multiple different post-translational states and interacts with numerous catalytic domains. understanding how nrpss are able to efficiently orchestrate this series of sequential protein-protein interactions between a cp and its partner catalytic domains is key to unraveling the molecular mechanism of nrp synthesis. using a combination of isothermal titration calorimetry and nuclear magnetic resonance (nmr) titrations, we found that converting a cp from the apo to holo form alters its affinity for its partner a domain. this change in binding suggests a means by which directionality in protein-protein interactions is achieved in nrpss. however, we also found that a domain binding affects the same subset of residues in both the apo and holo forms. in order to identify the molecular features underpinning this difference in affinity, we solved the nmr solution structures of the apo and holo forms of the cp. here, we present the solution structures of an apo and holo cp and discuss them in light of their differential binding to an a domain. functional analysis of of conditional analog-sensitive alleles of essential protein kinases in the fission yeast schizosaccharomyces pombe. juraj gregan , mfpl/imp, the genome of the fission yeast schizosaccharomyces pombe encodes for protein kinases that are essential for viability. studies of the essential kinases often require the use of mutant strains carrying conditional alleles. to inactivate these kinases conditionally, we applied a recently developed chemical genetic strategy. the mutation of a single residue in the atp-binding pocket confers sensitivity to small-molecule inhibitors, allowing for specific inactivation of the modified kinase. using this approach, we constructed conditional analog-sensitive alleles of essential protein kinases in the fission yeast s. pombe. i will present the functional analysis of these mutants during meiosis. peptide conjugates: from self-assembly towards applications in biomedicine ian hamley university of reading, dept of chemistry self-assembling peptides and their conjugates offer exceptional potential in nanomedicine. i will present some of our recent work on nanoscale assembled peptides and their conjugates, focussing on lipopeptides [ , ] and peg-peptide conjugates [ ] . pegylation is an important technique in the development of conjugates for applications in therapeutics. it is found to greatly influence self-assembly of peptides and proteins -one example from our own work is a peptide which itself forms twisted fibrils but when peg is attached, self-assembly of the conjugate leads to spherical micelles [ ] . the conjugate can be enzymatically degraded using alpha-chymotrypsin, releasing the peptide. this nanocontainer delivery and release system could be useful in therapeutic applications. thermoresponsive telechelic peg/peptides with hydrophobic dipeptide end groups (di-tyrosine or di-phenylalanine) were developed, one of which shows a de-gelation transition near body temperature and which may be useful in bioresponsive delivery systems [ ] . examples from our recent work on self-assembling lipopeptides will also be outlined. our focus is to investigate potential relationships between self-assembly and bioactivity, in particular in the fields of regenerative medicine [ ] [ ] [ ] [ ] [ ] , antimicrobial systems [ , ] and immune therapies [ ] . been shown to become derivatized with argpyrimidine, a prominent nem that occurs on arginine residues [ ] , in certain human cancer tissues and cell lines [ , ] . this nem was linked to the elevated antiapoptotic activity of the protein [ , ] , whereby modification of arg- appeared to be of particular significance [ ] . in this work, hsp homogeneously modified with argpyrimidine at position is generated for the first time. using expressed protein ligation [ ] , the first semisynthesis of the unmodified protein is achieved as well. our approach, which combines organic chemistry, peptide synthesis and protein synthesis, enables complete control over protein composition and thus can provide previously unattainable insight into the properties of this vital chaperone following nonenzymatic modification. the synthesis of argpyrimidine-modified hsp and the progress towards structural and functional characterization of the protein will be presented herein. kunitz-type protease inhibitors belong to a widespread protein family present in many plant species and play an important role in plant defense against insect pests and pathogens. members of this family are typically inhibitors of proteases of serine class. interestingly, a few members were identified as inhibitors of proteases of cysteine class, however, they have not been functionally and structurally characterized. our study is focused on kunitz-type inhibitors of cysteine proteases (pcpis) from potato (solanum tuberosum). a series of kda pcpis was purified using a multi-step chromatographical protocol, and two most abundant and effective isoinhibitors named pci - and pci were characterized in detail. they were screened against a broad panel of model cysteine proteases and digestive cysteine proteases from herbivorous insects. pci - and pci exhibit different inhibitory specificity pattern and potency up to the nanomolar range. both isoinhibitors were crystallized and their spatial structures were solved and refined at . Å (pci - ) and . Å (pci ) resolutions. a position of reactive sites against cysteine proteases on the conserved b-trefoil fold scaffold was proposed. the work provides the first analysis of pcpis with respect to the structure-function relationships and evolution within the kunitz-type inhibitor family. role of the abcc transporter in the mode of action of the bacillus thuringiensis cry ac toxin in the diamond back moth plutella xylostella protonation pattern influence actively properties of molecules and play an essential role in biochemical mechanisms. for an accurate determination of the protonation equilibria, the absolute proton solvation free energy needs to be known. the determination of this energy represents one of the most challenging problems in physical chemistry. this is particularly difficult for protons solvated in water, where the solvation is dynamically performed by different water clusters and the proton is not attached to a single solvent molecule. the proton solvation is notably important in order to quantify mechanisms of proton transfer and such processes have been investigated for a long time based on different approaches, often leading to contradictory conclusions. a rigorous and accurate protocol for computing proton solvation in solvents of different nature is of prime importance for applied (pharmaceutical and material science) and fundamental sciences. in this study, proton affinities, electrostatic energies of solvation and pka values of a reference set of organic molecules are computed in protic and aprotic solvents. proportional to the free energy of proton dissociation, the pka value calculation is therefore strongly dependent on the free energy of proton solvation. such energy is then determined in acetonitrile (acn), methanol (met), water and dimethyl sulfoxide (dmso) in order to obtain the best possible match between measured and computed pka values. the computation of these values is based on a combination of quantum chemical (qc) and electrostatic approaches by using a thermodynamic cycle connecting gas-phase and solvent-phase of proton dissociation. the computed proton solvation energies in acn, met, water and dmso of the present study are very precise (rmsd much lower than ph value). they will be a basis for better understanding of proton solvation and help to predict pka values of organic compounds in different solvents more precise. biochemical characterization of two evolutionary distant ten-eleven translocation enzymes and their utility in -methylcytosine sequencing in the genomes at single-base resolution subtypes leading to an inability to perceive pain and painful neuropathies, respectively. however, as nav ion channels are intimately involved in almost all aspects of physiology, only the most selective inhibitors would be suitable as drug leads. disulfide-rich venom derived mini-proteins from cone snails and spiders are being actively pursued as novel therapeutics for pain, because of their high selectivity and potency at human ion channels, including sodium channels (nav). two main strategies of inhibition have been identified; blocking the pore and interacting with the voltage-sensor domains (vsd) surrounding the pore. the ion-conducting pore is highly conserved between all sodium channel subtypes whereas the voltage-sensor domain binding sites are less conserved. therefore, inhibition of a specific nav isoform is more achievable using inhibitors that modulate vsds than with pore blockers. gating modifier toxins from spider and cone snail venom inhibit nav . and nav . by interacting with the vsd. they appear to reach their target by partitioning into the lipid membrane surrounding the ion channel, thus enabling access to the vsd. toxin pharmacology may therefore not only be driven by the peptide-ion channel interactions, but also including the lipids surrounding the channel protein, a feature that is very much under explored. it is therefore apparent that peptide-lipid interactions in combination with peptide-channel interactions need to be considered when designing potent inhibitors. using a range of biophysical techniques, including surface plasmon resonance and nuclear magnetic resonance, we are studying the interactions underpinning the mechanism of action between toxins and membranes and toxins and ion channels. initial results show that the lipid composition surrounding ion channels play a major role in terms of toxin:lipid interaction and that these interactions can be used in combination with traditional structure-activity relationship studies to design selective and potent nav inhibitors, which will be discussed. we believe that our studies will ultimately delineate what drives toxin pharmacology and nav subtype selectivity and will lead to improve rationally engineering of novel therapeutics for the treatment of pain. micelles promote aß assembly into pore-forming oligomers montserrat serra-batiste , mariam bayoumi , margarida gair ı , mart ı ninot-pedrosa , giovanni maglia , nat alia carulla institute for research in biomedicine (irb barcelona), biochemistry, molecular and structural biology section, university of leuven, the formation of amyloid-b peptide (ab) oligomers at the cellular membrane is considered to be a crucial process underlying neurotoxicity in alzheime rs disease (ad). - therefore, it is important to understand how oligomers form within a membrane environment. using solution nuclear magnetic resonance (nmr) spectroscopy, combined with size exclusion chromatography (sec), we have studied the two major ab variants-ab and ab , the latter having a more prominent role in ad than the former-under carefully selected micelle conditions intended to mimic a membrane environment. our results indicate that after an incubation period, ab , but not ab , assembles into oligomers with specific structural properties, which we have named stabilized micelle oligomers (smos). smo complexes incorporate into lipid bilayers as well-defined pores, a feature linked to neurotoxicity. these results have important implications in the ad field as they provide a new perspective on how ab oligomers cause neurotoxicity. indeed, our findings constitute a first step towards the establishment of a new therapeutic target for ad. dimer formation. it should be noted that this nb peptide contains the autophosphorylatable ser- associated with phk activation, and phosphorylated nb; peptide was considerably less effective in promoting b-dimer formation than non-phosphorylated peptide. these results suggest a role for ser- autophosphorylation in mediating homodimeric b subunit interactions within the phk complex, and augment previous studies on the activation of phk by phosphorylation in which changes at the nterminus of b are critical in the activation of the catalytic g subunit. summing these results leads to a new model of activation. in this model, in the inactive state, the nonphosphorylated n-terminus of b interacts directly or indirectly with the regulatory c-terminal domain of the g subunit, inhibiting catalytic activity. upon phosphorylation of the n-terminus of b, three important events occur: ) the interaction between b and g is disrupted, ) the b subunits of the holoenzyme self-associate, and ) the catalytic domain is activated. thus, we envision that the n-terminus of b acts as an allosteric switch, with activation triggered by phosphorylation of this region, causing disruption of its previously inhibiting interactions with g and promotion of b b dimerization to stabilize the activated conformation of g . the research was supported financially by the university of kansas medical center biomedical research training program and nih grant dk . pb- hssb is involved in the cellular response to oxidative dna damage christine touma , nicolas paquet , derek j. richard , roland gamsjaeger , , liza cubeddu , school of science and health, university of western sydney, queensland university of te chnology, school of molecular bioscience, university of sydney cellular dna is subject to oxidative damage in the presence of reactive oxygen species. the , -dihydro- -oxoguanine ( -oxog) adduct is the most common form of oxidative damage and results in g:c to t:a transversions; these lesions are normally processed by the base excision repair (ber) pathway. singlestranded binding (ssb) proteins of the oligonucleotide binding domain family are heavily involved in dna repair processes, which involve the detection of dna damage and recruitment of repair proteins to the site of damage. using immunofluorescence we demonstrate that hssb (a novel human ssb) levels increase in response to oxidative damage (h ). cells depleted of hssb are hypersensitive to oxidative damage and are also unable to efficiently remove -oxog adducts. we show that hssb forms dimers and tetramers under oxidative conditions and that this oligomerisation is likely mediated by inter-domain disulfide bond formation. furthermore, using surface plasmon resonance, we also show that oxidised hssb binds to -oxo-g damaged ssdna with higher affinity than non-damaged ssdna, indicating a direct role for oxidised hssb in the recognition of -oxo-g lesions. as oxidative stress is associated with aging, cancer and alzheimer's disease, understanding the molecular mechanisms of how cells repair oxidative dna damage will be crucial in the development of potential therapeutic treatments. epidemic typhus, which is caused by the bacterial pathogen rickettsia prowazekii, is a menacing disease world wide that the nih lists as one of america's greatest biological weapons threats. this research seeks to find novel inhibitors of b-ketoacyl-acp-reductase (fabg), an enzyme that catalyzes one of the reactions in the fatty acid synthesis type ii system in bacteria. this pathway is essential for survival in bacteria. the fabg enzyme uses nadph as a substrate, which facilitates the binding of the second substrate, acetoacetyl-acp into the active site. the acetoacetyl-acp is subsequently reduced into b-hydroxyacyl-acp. the coding dna sequence for the rpfabg protein was cloned into a pnic vector and transformed into e.coli bl (de ), then the protein was expressed and purified using metal affinity and size exclusion chromatography methods. high throughput molecular docking software (gold) was used to screen a commercial library of ligands against the acetoacetyl-acp region of the active site. the ligands with the best gold scores were selected to be tested in vitro. spectrophotometric enzyme inhibition assays were performed to determine whether the drugs could inhibit rpfabg activity. chlorogenic acid, a previously known inhibitor of homologous fabgs, was tested along with the other potential drugs, and was determined to have moderate inhibitory effects on rpfabg. loop modeling using icm software was performed in order to create a prediction of the complete rpfabg structure, including the disordered loops that are not a part of the f i pdb structure. co-crystallization of rpfabg with both substrates was carried out in order to obtain a structure, but only nondiffracting crystals resulted. further inhibition assays and crystallography trials are being performed in order to continue the search for a novel inhibitor of rpfabg and ultimately a treatment for epidemic typhus. the university of hong kong bioconjugation of proteins has emerged as a useful tool in the study of biological systems. there is an increasing need to develop new synthetic technologies for the bioconjugation reaction of proteins, and metal-catalyzed site-selective modification of proteins has attracted considerable interest in recent years. we have developed a ruthenium glycosylated porphyrin-catalyzed carbenoid transfer reaction for the site-selective modification of proteins. we firstly applied the catalysis to the selective modification of the n-terminus of peptides. by using ruthenium glycosylated porphyrin as catalyst, the n-terminus of a number of peptides can be modified through carbenoid n-h bond insertion in aqueous media with moderate to excellent conversion. the reaction is highly selective, for example, the reaction with ytsssknvvr, which contains various types of oxygenhydrogen and nitrogen-hydrogen bonds possibly available for carbenoid insertion, catalyzed by the ruthenium glycosylated porphyrin gave the n-terminal-modified product with > % conversion and without the formation of other modified peptides including doubly modified and oxygenhydrogen bond insertion products. we next extended the n-terminal modification method to proteins. eventually success was attained in the modification of rnase a and insulin. the reaction of rnase a with a diazoacetate mediated by ruthenium glycosylated porphyrin gave corresponding n-terminal-modified protein with % conversion. we also achieved a bioconjugation to ubiquitin via ruthenium glycosylated porphyrin-catalyzed alkene cyclopropanation in aqueous solution in two steps: ( ) incorporation of an alkenic group by the reaction of n-hydroxysuccinimide ester with ubiquitin and ( ) cyclopropanation of the alkene-tethered lys ubiquitin with the fluorescent labeled diazoacetate in the presence of a catalytic amount of ruthenium glycosylated porphyrin. the corresponding cyclopropanation product was obtained with % conversion based on maldi-tof mass spectrometry. in conclusion, we developed a ruthenium porphyrin-catalyzed siteselective modification of peptides and proteins in aqueous media. the method provides an entry to new bioconjugation reactions for protein modifications using metalloporphyrins as catalysts. uridine monophosphate synthase: architecture versatility in the service of late blight control francisco tenjo castaño , , manuel garavito , , leonor garc ıa , , silvia restrepo , barbara zimmermann biochemistry and molecular biology research group, universidad de los andes., mycology and plan pathology laboratory, universidad de los andes uridine monophosphate synthase (umpase), a bifunctional enzyme in the de novo pyrimidine biosynthetic pathway, is a protein comprised of orotate phosphoribosyl transferase (oprtase) and orotidine monophosphate decarboxylase (odcase). different fusion orders of the two domains have been documented to exist in nature. in some organisms oprtase and odcase are monofunctional proteins, and act as a complex. here, umpase from solanum tuberosum (potato) and from phytophthora infestans (an oomycete) were examined. p. infestans causes late blight disease in s. tuberosum, destroying crops and increasing production costs. since pyrimidines are fundamental cellular components, we have proposed that umpase could serve as a target to control p. infestans infection. the enzymes from p. infestans and s. tuberosum differ in their fusion order of oprt and odc. the study of these two umpase could facilitate the design of species-specific inhibitors, and might shed light on the effect of fusing umpase domains in one order or the other. to this end we carried out bioinformatic and biochemical characterization of the enzymes. sequence analyses showed residue differences among the p. infestans umpase sequences from three strains: , and t - . strain t - was found to have a duplicated umpase, but neither sequence corresponded to the ones predicted previously from the genome. a recombinant umpase from strain was expressed in bacteria and purified but it showed low solubility and was inactive in vitro. the recombinant umpase from the strain complemented both oprtase and odcase deficient e. coli strains. a soluble, active, recombinant protein was expressed and purified in the presence of high salt and the product ump (specific activity . lmol min- mg- ). the sequence skq was found at the c-terminus of the p. infestans umpase sequences and resembles a peroxisome signal peptide (skl). the predicted hydrophobicity of this umpase and its architecture (oprt at the c-terminus and odc at the n-terminus) resembles that of the umpase from leishmania donovani, which has been localized to the peroxisome. we suggest that p. infestans umps could also be located in this organelle. in contrast to the oomycete enzyme, s. tuberosum umpase is highly soluble, and has a higher specific activity (vmax . lmol min- mg- ). we measured the kinetic parameters km(orotate) . lm, km(prpp) . lm, and found that it exhibited product inhibition by pyrophosphate. in conclusion, the different architectures of the two umps might be related to distinct biochemical characteristics, further supporting this protein as a good candidate for p. infestans control. we present computer simulation studies of three different antimicrobial peptides we have been studying by md computer simulation in collaboration with experimentalists. the first is daptomycin, a potent lipopeptide currently licensed to treat infections caused by multi-drug-resistent bacteria. the mechanism of action of daptomycin is currently not completely understood. we have solved the nmr structure of this molecule, and attempted to determine the size of its oligomer by small angle neutron scattering (sans) supported by computer simulation. feglymycin is a -amino-acid peptide with a high percentage of unusual amino acids such as -hydroxyphenylglycine and , -dihydroxyphenylglycine. feglymicin inhibits mura and murc enzymes which are involved in bacterial peptidoglycan synthesis, while also displaying anti-hiv activity by interaction with the viral envelope protein gp . a previous x-ray structure shows the molecule forming a dimer. here, the molecule was studied by nmr in water and dmso. in water, the molecule is clearly at least a dimer, while in dmso it is a monomer. we have performed noe refinement simulations in order to elucidate a structure, however, due to a lack of long-range noe contacts, a unique structure cannot be determined. labyrinthopeptin a is a lantibiotic that contains labionin, a unique carbacyclic posttranslationally modified amino acid that links the protein backbone in three different locations. labyrinthopeptin a has shown promising activity as a pain killer. starting from the x-ray structure, we present results from the first md simulation studies of this unique peptide. because of the extensive cross-linking, this peptide is observed to be highly rigid in its native form. simulation results of mutants are also presented. antibiotics with new mechanism of action are urgently required to combat the growing health threat posed by resistant pathogenic microorganisms. here we report the discovery of a new peptidomimetic antibiotic (l - ), which is active with a minimum inhibitory concentration (mic) in the low nanomolar range, only against pseudomonas sp., and with a non-membrane-lytic mechanism of action. a drug target identified both in a forward genetic screen for resistance determinants and by photoaffinity labeling is the ß-barrel protein lptd, which plays an important role in lps transport and the outer membrane biogenesis. the x-ray structure of lptd in complex with lpte from shigella flexneri shows a stranded b-barrel linked to a periplasmatic n-terminal jelly-roll domain. interestingly the homology model structure for lptd from pseudomonas shows a significant difference: an insertion of around amino acids in the n-terminal domain. the results of our attempts to purify and characterize this large outer membrane protein and to determine the binding site of the peptidomimetic antibiotic will be shown. the theory of how life on earth begun still remains unclear. nevertheless, according to some theories, at the beginning level proteins did not emerge as a complex globular forms as know today. at the times, when solely rna molecules stored both genetic information and catalyzed the chemical reactions in primitive cells, peptides acted as a proteins nowadays [ , ] . literature postulate that the possible role of primordial short peptides was to catalyze reactions in rna-world, as they possess an excellent ability to self-assemble into well-ordered nanostructures [ , ] . elementary functional loops (efls) can be considered as a small structures (blocks) having specific signatures and providing functional residues important for binding/activation as well as principal chemical transformation steps of the enzymatic reaction [ ] . p-loop efl is a widespread structure across vast majority of protein families such as motor domains, aaa , reca, pepck and many others. sequential alignment of these protein families reveals existence of a conserved p-loop motif, that is able to bind atp molecule. we investigated the structure and atpase activity of peptides, which sequences possessed strongly conserved gxgk[t/s] motif from ploop. the goal of our work was to check if peptides corresponding to the most conserved p-loop motif fragment are able to bind and hydrolyze atp molecule. all peptides under study were chemically synthesized and their structures was investigated by nmr spectroscopy. the ability to bind atp molecules was analyzed by using hplc chromatography. results of our study show, that peptides with conserved p-loop motif have a suitable structures to promote binding of the molecules with phosphate group, but cannot accelerate pyrophosphate hydrolysis process. conference participation for w. _ z. supported by the fp project mobi health (grant agreement no ). computational resources were provided by the informatics center of the metropolitan academic network (ic man task) in gdansk, poland. ck is a ubiquitous serine/threonine protein kinase, being one of the most pleiotropic of all protein kinases . ck plays a key role in cell growth, differentiation, cell death and survival, and become the therapeutic target in cancer treatment, since its level is significantly increased in cancer cells . halogenated ligands have been widely developed as potent inhibitors of protein kinases. among them , , , -tetrabromobenzoteriazole (tbbt) is one of the first potent and selective inhibitor of ck a, directed towards the conserved atp binding site . to assess contribution of electrostatic interactions to the specificity and strength of binding of multi halogenated inhibitors by a protein kinase, we have studied interaction between ck a and nine benzotriazole derivatives, representing all possible patterns of halogenation on the benzene ring. herein, we present results that support existence of two alternative regions that are involved in ligand binding. aspartic acid is known for its function in coordination of a mg ion, which is required for atp binding . asp has been identified in crystal structure of ck :tbbt complex (pdb j , fig. ) as the charged residue closest to tbbt. there is also lys proximal to tbbt, interaction with which may favor anionic form of ligands (pk for tbbt < ), however it is involved in the intramolecular salt bridge, and thus its mutation may significantly change stability of the protein. crystal structure of tbbt complexed with ck (pdb: j ). residues with a distance to tbbt (magenta) shorter than a are shown. red residue is negatively charged, blue ones are protonated. abstract comparison of kdiss values determined for ligands at ph and at ph shows that strength of the complex significantly varies upon deprotonation of the triazole ring. this confirms former hypothesis that a negatively charged ligands cluster at the atp binding site region proximal to lys , which is beneficial both to the specificity and to strength of the binding. we have also observed for the tested ligands variations in their binding to either wild type protein and its d n mutant (with less negative charge distributed over atp binding site). all ligands displaying higher pka for dissociation of the triazole proton bind to the mutant visibly weaker than to the wild-type protein. altogether reveals the predominance electrostatic intermolecular interactions. although, negatively charged ligands most probably cluster at the atpbinding site proximal to lys , beneficial for the strength of binding, the less dissociated forms are favored due to unfavorable interactions of the anionic form of ligands with asp . there are many virulence factors produced by these strains, many of which are encoded on mobile genetic elements. psms are of specific interest because these virulence factors are encoded on the core genome of the bacteria and therefore all strains of staphylococci bacteria produce some variation of psms with a variety of biological functions. the specific mechanism by which psms act as virulence factors has been poorly understood until recently. biological functions of psms include cell lysis, biofilm formation and the ability to kill neutrophils after phagocystosis. these toxins are of special interest to our research group due to their genetic similarities to certain bacteriocins, namely leaderless bacteriocins. both groups of peptides are ribosomally synthesized with a n-terminal formyl methionine and secreted from the bacteria by atp-binding cassette (abc) transporters without any leader sequence or signal peptide. abc transporters may also play a role in immunity towards psms and leaderless bacteriocins. these similarities led our group to investigate the solution structure of these peptides through nuclear magnetic resonance (nmr). isolating psms from the producer organisim, s. aureus, typically involves lengthy extractions and low yields. for these reasons, we opted to chemically synthesize the desired peptides using solid phase peptide synthesis (spps). utilizing a variety of spps techniques, psm a and psm a were successfully synthesized, however, due to the hydrophobic nature of psm b , an alternate genetic approach was devised to isolate psm b . formation of a fusion protein between psm b and the small ubiquitin like modifier (sumo) protein allowed for heterologous expression. upon cleavage of the fusion protein with sumo protease, and subsequent purification and isolation of the cut peptide, psm b was obtained. as previously reported, the psms were found to be alpha-helical in structure inducing solvents. a series of dimensional ( d) nmr experiments were ran to determine chemical shift assignments and to obtain noe data. importing the chemical shift assignments and noe data into the structure calculating software, cyana, we were able to elucidate the solution structure of psm a and psm a and we are currently working towards the elucidation of psm b . the synthesis, isolation, characterization and solution structures of the aforementioned psms will be discussed here. transition metals are critical for enzyme function and protein folding, but their excess can mediate neurotoxic oxidative processes [ ] . as, energy production involves oxidative phosphorylation, a process requiring a continuous flow of electrons, mitochondria are particularly vulnerable to oxidative damage [ ] . as such, mitochondria are the major sites of reactive oxygen species (ros) generation, which are produced as byproducts of the electron transport chain. since free iron and certain ros can engage into potentially deleterious processes such as fenton reaction, mitochondrial iron homeostasis must be tightly controlled, and dysregulation of iron metabolism in this organelle has been associated with various diseases, including friedichs ataxia (fa), alzheimer's, and other neurodegenerative disorders [ ] . engineering an efficient mitochondriatargeting, cell-permeable vector is a challenge due to the fact that mitochondrion is impermeable to a wide range of molecules. the development of delivery vectors has been made possible by a greater understanding of mitochondrial structure and chemical features of molecules that selectively localize within this organelle. from these findings, two generalized requirements for mitochondrial localization are delocalized positive charge and lipophilicity [ , ] . targeting iron in this organelle is proposed as a means to ameliorate fa symptoms. desferrioxamine (dfo) is a bacterial siderophore with high affinity for iron, but low cell penetration. we prepared conjugates of dfo with mitochondria penetrating peptides and studied their iron-binding characteristics in vitro. the lipophilic and charged peptides tat - (h-arg-lys-lys-arg-arg-gln-arg-arg-arg-oh) [ ] , a (h-cha-arg-cha-lys-cha-arg-cha-lys-nh ) [ ] , ss- (h-dmt-arg-phe-lys-nh ) [ ] and ss- (h-phe-arg-phe-lys-nh ) [ ] , are known to permeate cytosolic and mitochondrial membranes. they were prepared and conjugated to dfo in solid-phase [ ] , an alternative synthetic route. once detached from the resin, fully deprotected, purified and characterized by means of lc/ms and aminoacid analysis, it was observed that the dfo-conjugated peptides displayed iron-binding abilities identical to the free chelator dfo. dfo-conjugated peptides were also able to quench the iron-catalysed oxidation of ascorbate (a model of oxidative stress in plasma of iron-overloaded patients), as probed by a high throughput fluorimetric method [ , ] . these results indicate that our synthesis and conjugation strategy were successful in preserving the iron-binding moiety and the antioxidant ability of the free chelator dfo. the proteolytic activity and oligomerization status of the human htra protease functioning as a tumor suppressor of an n-terminal domain not required for proteolytic activity, a central serine protease domain and a cterminal pdz domain. the latter serves as a substrate or regulator binding domain and may participate in oligomerization. htra s, its short natural isoform, lacks the pdz domain which is substituted by a stretch of c-terminal amino acid residues, unique for this isoform. down-regulation of htra in tumors, shown by other groups and us, suggests htra s involvement in oncogenesis [ ] . htra acts as a proapoptotic protein and is suggested to function as a tumor suppressor. it promotes cytotoxicity of etoposide and cisplatin in lung cancer cell lines [ , ] . to date, htra has been poorly characterized from the biochemical point of view, mainly due to the fact that it is difficult to purify recombinant htra . we were able to express in bacterial system and purify htra in quantities sufficient to perform structural studies. the aim of this study was to characterize and compare the proteolytic properties and quaternary structure of the htra isoforms. both studied isoforms lacked the n-terminal domain. htra with the pdz domain removed (htra -dpdz) and htra s (htra s) were fully active at a wide range of temperatures and their substrate affinity was not impaired. this indicates that the pdz domain is dispensable for htra activity. as determined by size exclusion chromatography, htra formed stable trimers while both htra -dpdz and htra s were monomeric. this suggests that the presence of the pdz domain, unlike in other human htras (htra and htra ), influences htra trimer formation. the unique c-terminal sequence of dn-htra s appeared to have little effect on activity and oligomerization [ ] . cyclodextrins (cds) are cyclic oligosaccharides that have been recognized as useful pharmaceutical excipients. in aqueous solution cds are capable to form complexes with various ligands, hosting inside their cavity either a whole molecule, or part of a ligand. inclusion complexes with cds offers a variety of physicochemical advantages over the biologically active ligands, including the improved aqueous solubility, solution stability or an increase of bioavailability. ck is an ubiquitous, highly pleiotropic and constitutively active ser/thr protein kinase. halogenated benzotriazoles have been developed as potent and selective inhibitors of this enzyme. the interaction of the catalytic domain of human protein kinase ck with a series of brominated ligands, which represent all possible patterns of halogen substitutions to the benzene ring of benzotriazole, was previously studied by microscale thermophoresis (mst) [ ] . this method alloweddetermination of binding affinities for seven ligands, all of which were found consistent with the values determined independently by isothermal titration calorimetry (itc). however, a very limited aqueous solubility of some brominated benzotriazoles may decrease their bioavability, thus affectingtheir apparent activity [ ] . to overcome this limitation, the aqueous solubility of halogenated benzotriazoles in the presence of cyclodextrins has been tested. the formation of inclusion complexes with b-cyclodextrin (b-cd), hydroxypropylb-cyclodextrin (hp-b-cd) and g-cyclodextrin (g-cd) in aqueous solutions, followed by uv-vis spectroscopy, substantially improved the solubility of tbbt and its derivatives. the interaction between protein kinase ck and cyclodextrins, and also with their inclusion complexes with halogenated benzotriazoles, was followed with the aid of the microscale thermophoresis. the results obtained clearly demonstrate that the binding of halogenated benzotriazoles by ck is only moderately affected by cyclodextrins. oligonucleotide-based molecular circuits offer the exciting possibility to introduce autonomous signal processing in biomedicine, synthetic biology, and molecular diagnostics. here we introduce bivalent peptide-dna conjugates as generic, noncovalent, and easily applicable molecular locks that allow the control of antibody activity using toeholdmediated strand displacement reactions. employing yeast as a cellular model system, reversible control of antibody targeting is demonstrated with low nm concentrations of peptide-dna locks and oligonucleotide displacer strands. introduction of two different toehold strands on the peptide-dna lock allowed signal integration of two different inputs, yielding logic orand and-gates. the range of molecular inputs could be further extended to protein-based triggers by using proteinbinding aptamers. insights of a novel kind of cell wall binding domain that cleaves the peptidoglycan muropeptide: the cw_ motif noem ı bustamante , , manuel iglesias, noella silva-mart ın, isabel uson, pedro garc ıa, juan hermoso, marta bruix, margarita men endez institute of physical-chemistry 'rocasolano', csic, institute of physical-chemistry 'rocasolano', csic, ciber of respiratory diseases (ciberes), center of biological research (cib), csic, enzybiotics constitute a hopeful alternative to current treatments to fight against bacterial infections. phage endolysins are consider as enzybiotics due to their capacity to cleave the peptidoglycan (pg) of gram-positive bacteria in a generally species-specific manner and kill bacteria when exogenously added ( , ) . the cpl- endolysin, a lysozyme encoded by the pneumococcal cp- bacteriophage, is a remarkable exception among all the pg hydrolases produced by streptococcus pneumoniae and its bacteriophages due to its capacity of degrading pneumococcal cell walls containing either choline or ethanolamine ( , ) . this fact confers to cpl- the advantage of displaying a broader microbicide spectrum comparing to choline binding proteins ( ) . this behavior results from the acquisition of a cell wall binding module (cwbm) made of three identical repeats of amino acids each (cw_ motifs), with unknown specificity and totally unrelated with the choline-binding motives present in pneumococcal hydrolases. interestingly, cw_ repeats have been identified in many putative proteins potentially involved in cell wall metabolism (pfam entry: pf ) from different species of gram positive and gram negative bacteria, and some bacteriophages ( ) . preliminary studies of thermal stability in presence of a small cell wall structural-analogue (glcnac-murnac-l-ala-d-isogln) point to the muropeptide as the cell wall target recognized by cw_ motifs ( ) . in this communication we have gone in depth in the characterization of cw_ repeats. we present the first crystal structure of the cw_ motif, which reveals a three-helical bundle folding. using std_nmr spectroscopy the epitope of binding of the disacharide dipeptide to this repeats has been identified. interestingly, the b anomer of the murnac moiety, the form present in the peptidoglycan, seems to be preferentially recognized with respect to the a anomer. finally, a docking model of the complex cw_ /gmdp compatible with std results was built allowing to identify the major contacts between the protein and the muropeptide and to propose the relevant role of a conserved arginine residue in this interaction. energy-dependent aaa proteases carry out regulated proteolysis to ensure protein quality control and post-translational regulation of many cellular processes. control of proteolysis occurs primarily at the level of substrate recognition, which can be modulated by adaptor proteins. the clps adaptor protein enhances and inhibits degradation of different classes of substrates, and thus triggers a specificity switch in clpa. whereas the mechanism for substrate delivery by clps has been described in detail, the inhibition mechanism is poorly understood. we show that clps inhibits ssra substrate recognition and processing, instead of simply preventing substrate binding. we demonstrate that clpa engagement of the clps n-terminal extension (nte) is necessary, and may even be sufficient, for inhibition. in addition, we find that inhibition of substrate processing requires a longer nte, as compared to inhibition of substrate recognition. interestingly, the nte length required for inhibiting substrate processing is also necessary for suppression of the clpa atpase rate. furthermore, preliminary data suggests that clps slows down substrate translocation. these results support a model where there is an ssra•clpa•clps inhibitory complex in which the clpa pore engages the clps nte. this engagement of the nte causes suppression of atpase activity, and therefore slower substrate translocation and processing. this model illustrates how an adaptor protein can inhibit recognition of one type of substrate while efficiently promoting degradation of a different substrate. single-molecule assay development for studying human rna polymerase ii promoter-proximal pausing rna polymerase ii (polii) pausing has been shown to play a significant role in transcription regulation of elongating polii complexes in a large number of metazoan and mammalian genes ( ) . the traditional understanding of transcription regulation in mammals involved controlling polii recruitment to promoters and controlling initial steps at the promoter, including pre-initiation complex formation and promoter escape. most works investigating promoter-proximal polii pausing have employed chromatin immunoprecipitation followed by sequencing to determine polii localization or in vitro transcriptional assays using nuclear extracts analyzed with radio-active gel electrophoresis. in order to gain greater mechanistic insight into the regulation of promoter-proximal polii pausing, we have been developing a diffusion-based single-molecule method using alternating laser excitation on the micro-second timescale (msalex). the method detects rna transcripts generated by a reconstituted human polii system in vitro using complementary doubly dye-labeled single-stranded dna (ssdna) probes. the human gene hspa b for heat shock protein (hsp ) is used as a model system due to its extensive characterization in drosophila. the method would provide a rapid, sensitive and robust avenue to screen for protein factors regulating promoter-proximal polii pausing. controlling of the pic composition using the reconstituted system allows for dissection of the functional roles of different pic components in facilitating regulation of polii pausing. we have demonstrated the hybridization of double dye-labeled ssdna probe to complementary ssdna mimicking rna transcripts and to transcripts generated with bacterial rna polymerase. also, a functional reconstituted human polii system has been verified using radioactive polyacrylamide gel electrophoresis of transcripts from in vitro transcription assays. malaria is a major global health problem. in , there were an estimated million case of malaria and deaths, most of them children under years old [ ] . among the malaria species that affect humans, plasmodium falciparum is the most deadly form. since no efficient vaccine is available yet, the fight against malaria includes vector control, protection from mosquito bites and artemisinin combined therapy. however, resistances to all known treatments have been observed. therefore, new antimalarial strategies involving novel targets and new mechanisms of action are needed. during its life cycle, in erythrocytic stage, which causes all the malaria symptoms, plasmodium falciparum relies on phospholipids to build the membranes necessary for daughter cell development. approximately % of parasite phospholipids consist of phosphatidylcholine (pc) and phosphatidylethanolamine (pe) synthesized by the parasite through the de novo kennedy pathways. in the pathway of phosphatidylcholine biosynthesis, the second step catalyzed by ctp:phosphocholine cytidylyltransferase [ec . . . ] is rate limiting and appears essential for the parasite survival at its blood stage [ ] [ ] . we are focused on the structural characterization of this enzyme, the identification of effectors by fragment-based drug design approach (fbdd) and then their optimization to eventually design a lead. the first reported crystal structure of the catalytic domain of the enzyme target (pfcct) has been solved at resolution . Å, enzyme-substrates complexes (cmp-, phosphocholine-and choline-bound forms) at resolutions . - Å and an enzyme-product (cdp-choline) complex structure at resolution . Å that give detailed images of binding pocket, demonstrate conformational changes between apo-and holo-protein forms and provide the information about the mechanism of the catalytic reaction at atomic level. the fbdd method uses a library of small molecules (fragments) with molecular weight that does not exceed da to explore target binding sites. although fragments often have too low affinities to evoke a biological response, their probability of binding is high because they are small enough to prevent unfavorable interactions with target protein-binding sites. moreover, they represent more attractive and synthetically tractable starting points for medicinal chemistry compared to more complex compounds. as the affinity is low, fragment screening usually depends on detecting binding rather than inhibition. screenings of a fragment library ( molecules) has been performed by fluorescence-based thermal shift assay and nuclear magnetic resonance saturation transfer difference (nmr std) [ ] . this combination of techniques identified so far fragment hits that are currently evaluated for their binding modes and affinities. co-crystallization of the protein-fragments complexes is carrying out to provide accurate information on the molecular interactions. topology of interactions will be used to rationally monitor every iterative round of the optimization process allowing subsequent rational design. [ ] world health organization, world malaria report (who press, geneva, switzerland), http://www. who.int/malaria/publications/world_malaria_report_ /wmr- -no-profiles.pdf?ua protein scaffolds play a crucial role in signaling pathways by generating signal specificity and increasing signal efficiency and amplitude. engineered protein scaffolds can be used as key regulators for signal transduction in artificial signal transduction cascades where they can regulate in-and output of the network. in this research a - - protein scaffold is developed which induces dimerization of proteins mediated by the small molecule stabilizer fusicoccin. as proof of principle caspase is used to constitute proximity induced dimerization. dimerization of caspase leads to its activation and consecutively initiates the caspase cascade involved in the programmed cell death pathway. caspase does not naturally bind to - - proteins, therefore the caspase monomer is conjugated to a - - binding motif which is known to bind into the binding grooves of a - - dimer. this interaction can be stabilized by the small molecule fusicoccin. we showed that upon addition of small molecule fusiccocin caspase dimerization is induced, resulting in caspase activity which is measured using a synthetic caspase substrate. moreover the biphasic effect of the - - scaffold could be proven. additionally, the activated caspase is also able to cleave its natural substrate caspase , downstream in the caspase cascade. these results indicate that the - - platform is a versatile small molecule induced dimerization platform which can be used as tool for engineering of a synthetic signaling network. the g e variant of the apoptosis inducing factor, responsible of a rare encephalopathy, is hampered in nad /h binding luca sorrentino , laura rigamonti , mirvan krasniqi , alessandra calogero , vittorio pandini , maria antonietta vanoni , alessandro aliverti the apoptosis inducing factor (aif) is a highly conserved mitochondrial flavoprotein known to play two opposite roles in eukaryotic cells: in mitochondria it is required for efficient oxidative phosphorylation (oxphos), while, when released into the cytoplasm, it triggers caspase-independent apoptosis ( ) . the mechanism of aif-induced apoptosis was extensively investigated, whereas its mitochondrial role is poorly understood. there are many evidences of aif importance for mitochondrial correct morphology and functions and recently the discovery of its direct interaction with chchd , a key regulator of respiratory complexes subunits import and folding in mitochondria, was reported ( ) . a unique feature of aif, probably pivotal for its vital function, is the ability to form a tight, air-stable charge-transfer (ct) complex with nad and undergo dimerization. although some aspects of aif interaction with nad / h have been analyzed, its precise mechanism is not fully understood. we investigated the effect of the pathogenic g e replacement, associated with oxphos defect and neurodegeneration ( ) , to understand how it could alter aif properties at the molecular level. to do so, we analysed how the wild type and the g e forms of murine aif interact with nad /h and nicotinamide mononucleotide (nmn / h), finding that the pathogenic replacement resulted in a dramatic and specific decrease of the rate for ct complex formation and consequent protein dimerization only in the case of the physiological ligand. our results demonstrate that the adenylate moiety of nad /h is crucial for the ligand binding process and that the g e replacement causes an alteration of the adenylate-binding site of aif that drastically decreases the affinity for and the association rate of the ligand. in addition, we shed new light on the mechanism of the dimerization process, demonstrating that fad reduction rather than nad /h binding initiates the conformational rearrangement of aif that leads to quaternary structure transitions. taken together, our results contribute to define how aif works at the molecular level in binding nad /h and undergoing dimerization and also point out that the g e replacement, responsible of a rare neurodegenerative disease, has the selective effect of slowing down the formation of aif dimeric ct complex. dipartimento di bioscienze, universit a degli studi di milano, dipartimento di scienze veterinarie e sanit a pubblica, universit a degli studi di mical, from the molecule interacting with casl, indicates a family of conserved cytoplasmic multidomain proteins that catalyze a nadph-dependent f-actin depolymerization activity through their essential n-terminal fad-containing monooxygenase-like domain (mo) in response to semaphorin signaling [ ] . this domain is followed by calponin homology (ch) and lim domains, proline-and glutamate-rich regions and a c-terminal coiled-coil motif that mediate the interaction with various proteins (e.g: crmp, casl, plexin, g proteins, ndr) [ ] . to contribute to establish the catalytic properties of mical mo and their modulation by the additional domains and by the interacting proteins, we have produced and are characterizing the human mical (mical-fl) and forms containing the mo [ ] , mo-ch and mo-ch-lim domains. all mical forms contain stoichiometric amounts of fad in the mo domain and zn ions in the lim domain. mical-mo catalyzes a nadph oxidase (h o -producing) activity. the ch, lim and c-terminal domains lower its catalytic efficiency (kcat/km, nadph) mainly due to an increase of km for nadph. the kcat is similar for all forms excepted for mical-fl where a -fold drop is observed, in agreement with the proposed autoinhibitory function of the c-terminal domain [ ] . the ph dependence of the kinetic parameters of mo, moch and mochlim is complex suggesting that it does not reflect the ionization state of individual groups, but rather the overall protein charge. mical-mo, -moch and -mochlim catalyze a nadph-dependent f-actin depolymerization with a similar apparent km for actin. f-actin (but not g-actin) stimulates the rate of nadph oxidation by increasing kcat and lowering knadph. the extent of nadph oxidation exceeds total f-actin which is in contrast with the proposal of specific modification of actin met or met reported for drosophila and mouse moch [ ] [ ] , but it suggests that f-actin stimulates the nadph oxidase activity or a case of substrate recycling. accordingly, with hmical mo and moch several actin residues are oxidized beside met and met . thus, the ch and lim domains do not seem to be important for the mical-actin interaction and actin modification may be mediated by in situ h o production. in hek t and cos- cells mouse collapsin response mediator protein- (mcrmp ) interacts with mical inhibiting h o production [ ] , suggesting that crmp could be a hydroxylatable substrate of mical-mo. we have produced the same mcrmp form ( - aa) and we have shown that under conditions that limit non specific interactions a mild stimulation (up to %) of nadph oxidation is observed. f-actin reversed the effect of mcrmp suggesting their competition for mical. these results suggest that crmp , a major microtubules regulator, is not the substrate of the mo domain, but actin and microtubules cytoskeleton components may be linked through the formation of crmp-mical complex in response to semaphorin-plexin signaling. experiments are in progress to complete the characterization of mochlim and full length mical forms. green fluorescent protein (gfp), owing to its genetically encoded strong fluorescence, has become one of the most important tools in modern biology [ ] . enhanced gfp (egfp, f l/s t-gfp), frequently used variants of this protein, is thermodynamically more stable and -times brighter than gfp [ ] . due to the improved fluorescent properties, egfp is commonly used as a fluorescent intracellular marker in bio-imaging in vitro and in vivo. despite sustained interest of the scientific community and numerous practical applications, the actual biological role of gfp remains elusive. recent reports put forward a hypothesis of antioxidant and photo-protective functions of gfp [ ] . in this study, we focused on the photo-protective role of egfp against reactive oxygen species (ros) photo-generated by visible light in water suspensions of nano-particular nitrogen-doped titanium oxide (n-doped nano-tio ), that is in the system: 'n-doped nano-tio )/visible light'. n-doped nano-tio (sumitomo tp-s ) was chosen as a photo-catalyst, since it is widely accepted that nitrogen doping enhances visible light photoactivity of tio . -hydroxy- , , , -tetramethylpiperidine-n-oxyl (tempol), a paramagnetic water-soluble compound, belonging to the nitroxide class o superoxide dismutase (sod) mimetics, was used as a target for photo-generated ros. a solar simulator, with the flux output intensity of kw/m , was used as a visible light source. electron spin resonance (esr) was employed to monitor the changes in the paramagnetic signal of tempol exposed to the action of ros in the absence and presence of egfp. in the absence of egfp and after min of illumination, due to a combined action of superoxide (o •-) and hydroxyl (oh•) radicals generated by the system 'n-doped nano-tio )/visible light, the esr signal of um tempol decayed by %. moreover, the growth of a new signal, interpreted as -oxo- , , , -tetramethyl- -piperidinyloxy (tempone), resulting from the attack of oh•radicals on tempol, was also observed. in contrast, in the presence of egfp ( . um) , the ros-induced decay of the esr signal of tempol was markedly smaller, not exceeding %. concomitantly, the growth of the esr signal of tempone was also partially inhibited ( % smaller amplitude), as compared to the process performed in the absence of egfp. in summary, our results point to a significant inhibition of the photodecomposition of tempol in the presence of egfp and support the hypothesis of the protective role of this fluorescent protein against ros generated by the system 'n-doped nano-tio )/visible light'. school of chemistry, national university of ireland galway, school of biochemistry and immunology, trinity college dublin by studying a variety of anionic ligands and their interactions with cationic cytochrome c, we are building knowledge of protein recognition geared towards regulating activity. in previous work it was shown that psulfonatocalix [ ] arene selectively binds to, and encapsulates, three lysine side chains on cytochrome c . here, the binding of two small molecule ligands to cytochrome c was investigated. nmr spectroscopy was used and in one case, a crystal structure of the complex was obtained (fig ) . the calixarene bound to cytochrome c, reveals a crystal packing assembly that suggests it is a key mediator of crystal formation. nmr data analysis indicates the calixarene's binding site on cytochrome c. the pillararene, a relatively new class of compound, is a symmetrical arrangement with a p-rich cavity , related structurally to calixarenes. this suggests good host-guest complexation properties. previously, the carboxylatopillararene showed selective binding to arginine, lysine and histidine . with this ligand, an interaction with cytochrome c was observed and a complex formed. additionally, biphasic binding behaviour was observed through analysis of the chemical shift perturbations. this may indicate more than one binding event taking place. the data from these studies indicate that recognition is occurring and again that lysine side chains play an essential role. the enzyme dihydrofolate reductase (dhfr) is necessary for the growth and development of all organisms. the structure and function of escherichia coli dhfr have been characterized in buffer. however, dhfr exists in living cells, where the protein concentration can exceed g/l. we know that weak, non-specific chemical interactions with cytosolic proteins alter protein conformation and dynamics, , both of which are expected to influence dhfr catalysis. investigators have examined steady-state enzyme kinetics under crowded conditions, but conclusions can be conflicting. , here, the effects of crowding on e. coli dhfr catalysis are assessed through specific activity measurements in solutions of synthetic polymers. these kinetics studies are complemented by in-cell and in vitro f nmr data from fluorinated tryptophan residues. preliminary results suggest that the effects of polymeric crowders on dhfr activity are non-monotonic, which may arise from the polymer's transition from the dilute to semi-dilute regime. the data suggest that synthetic polymers are not a valid representation of the cellular interior. biotechnology department, university of verona calcium (ca ) is one of the most important second messengers in eukaryotes. ca binding proteins can be subdivided into two categories: "ca buffers" that modulate ca ion concentrations in cells, and "ca sensors" that decode ca signals in a wide array of physiological processes in response to external stimuli. calmodulin (cam) is the prototypical example of ca sensor proteins in both animals and plants. in addition to conserved cam, plants possess a unique family of cam-like proteins (cmls). many of these cmls still remain uncharacterized and the investigation of their biochemical and biophysical properties will provide insight into ca signalling in plants. herein, a detailed characterization of arabidopsis thaliana cml is reported. cml is a protein of amino acids with a theoretical molecular weight of , da and % amino acid sequence identity with atcam . cml is predicted to have one functional ca binding site despite the presence of three ef-hand motifs (prosite). we overexpressed cml in e. coli and analyzed its biochemical and biophysical characteristics, i.e. calcium affinity and stoichiometry and eventual changes in conformation, thermal stability and proteolytic susceptibility upon ca binding. isothermal titration calorimetry (itc) and nuclear magnetic resonance (nmr) spectroscopy identified one ca binding site in cml and showed that ca and mg compete for the same binding site. the kd values determined by itc established that cml has higher affinity for ca than for mg . our data were consistent with the sequence based prediction of one functional calcium binding site. differential scanning calorimetry (dsc) showed that ca and mg have the same stabilizing effects on protein folding. apo-cml undergoes two thermal unfolding transitions, but in the presence of ca or mg only one unfolding event at an intermediate temperature occurs. limited proteolysis experiments showed that ca binding affords protection against cml digestion by trypsin. surprisingly, cml exhibits very few conformational changes upon calcium binding, which were evaluated by ans fluorescence and stokes radius measurements in the apo-and ca bound-forms. these results suggest that cml does not show the characteristics of a classical ca sensor protein. to better understand the physiological role of cml in plants, in vivo analysis will be performed. pb- fbp controls the hepatocyte morphology through rho signaling jun zhang , mingming ling , qianying zhang , yunhong wang , deqiang wang the department of cell biology and genetics, the formin-binding protein (fbp ) widely expressed in eukaryotic cells was previously identified to play a role in morphological maintenance in hepatocyte, but the molecular mechanism keeps still unclear so far. in the present investigation, it was found that rho family proteins cdc /rac signaling was involved in the morphological regulation controlled by fbp . knockdown of endogenous fbp expression with rnai technique or dominant negative mutant of fbp could trigger the cell morphological remodeling from the epithelioid to fibroid following the significant down-regulation of cdc / rac activities and dephosphorylation of paxillin. while the rho protein specific activator could restore the cdc /rac activities, and in turn abrogated the silence effect. overexpression of wild type fbp could not result in any of the morphological transition. furthermore, withdrawal of the silence could induce morphological recovery when the fbp expression, cdc /rac activities and paxillin phosphorylation were restored to the normal level. the experimental evidences strongly indicated that fbp was implicated in morphological control probably via rho signaling pathway in hepatocyte. key words: fbp ; rho signaling; paxillin; morphological control; hepatocyte this work was supported by a grant from national natural science foundation of china (nsfc, no. cytochrome c oxidase (cco) is the final enzyme in the respiratory chain of mitochondria but also an integral part of the metabolism of many types of bacteria. in a complex, stepwise redox-reaction, cco catalyzes the reduction of molecular oxygen to water and utilizes the resulting free energy to pump protons across the membrane thereby creating an electrochemical gradient [ , ] . to investigate proton pumping spectroscopically it is possible to label the entrance of the proton entrance channel with fluorescein, a ph sensitive dye, which allows determining time resolved local changes in proton concentration at the cytoplasmic cco surface and related properties. it has already been shown that the redox state of copper and heme centers affects such properties at the cytoplasmic surface. [ ] this study is a theoretical approach to investigate changes of pka values of the fluorescein label at the entrance of the k-channel for different protonation pattern in both oxidized and reduced cco by performing molecular dynamics (md) simulations. further work is based on calculations of pka values of the fluorescein using software karlsberg [ , ] . methods for genetically and synthetically manipulating protein structure enable a greater flexibility in the study of protein function. we have shown that using inteins as traceless, cleavable purification tags enables the separation of full length unnatural amino acid (uaa) containing proteins from their corresponding truncation products. this method has been used to incorporate uaas in previously unattainable positions in a variety of proteins using a myriad of uaas, inteins, and purification tags. in other applications, we have used e. coli aminoacyl transferase (aat) to selectively modify the n-termini of proteins with uaas in denaturing conditions and conditions that maintain folding. applications of particular interest include overcoming the need for an n-terminal cys residue in expressed protein ligation, transfer of reactive handles for "click" chemistry labeling of proteins, and transfer of fluorogenic molecules for photophysical experiments. we have found that aat can transfer protected cysteine, homocysteine, and selenocysteine to expressed proteins. after ligation, these residues can be converted to met or ala, making the ligation traceless. we continue to develop variants of aat to broaden the substrate scope of both its transferred substrate and n-terminal recognition element. in addition, expressed protein ligation is being used to incorporate backbone modifications, such as the thioamide, into various positions in the protein calmodulin to determine how these modifications can impact the structure and function of an ordered protein. in general, by working at the interface of several protein modification technologies, we have made beneficial discoveries that might be missed by more focused approaches. function and modularity of cw_ motives in the c-terminal region of the endolysin cpl- encoded by the cp pneumococcal bacteriophage manuel iglesias-bexiga , , noelia bernardo-garc ıa , rub en mart ınez-buey , noem ı bustamante , , guadalupe garc ıa , , marta bruix , juan hermoso , margarita men endez , dept. of biological physical-chemistry, iqfr-csic, ciber of respiratory diseases (ciberes), department of crystallography and structural biology, iqfr-csic, bacteriophage lytic murein-hydrolases have been proposed as enzybiotics, an efficient way to fight bacterial infections. however, the use of these enzymes is normally restricted to gram-positive bacteria since the outer membrane of the gram-negative bacteria hampers the access of the hydrolases to the peptidoglycan substrates. all the murein hydrolases reported in the pneumococcal system, both from host or phage origin, depend on the aminoalcohol choline to be fully active. there is only a unique exception to this rule, the cpl- lysozyme. this hydrolase is encoded by the lytic pneumococcal phage cp- and, instead of the common cell wall binding module (cwbm) that recognizes choline, cpl- harbors a completely different cell wall binding structure. recent studies have revealed that reducing the net charge of the cwbm, from . to . , leads to an improvement in the antibacterial activity of cpl- ( ) . the cwbm of cpl- is composed by three identical repeats of amino acids, the cw_ motives, and it folds both in the presence and in the absence of the n-terminal catalytic module ( ) . this module shows the capacity of recognize the glcnac-murnac-l-ala-d-isogln muropeptide (gmdp), structurally related with the peptidoglycan basic unit ( ) . here, we report the high resolution structure of the cell wall binding module of the cpl- endolysin. each cw_ repeat is composed of a bundle of three a-helices with a highly negative electrostatic charge at the surface. the strong inter-repeat interactions and the high ionic strength used in the crystallization conditions allow them overcoming the electrostatic repulsions inducing a closed-packed structure with a three-fold symmetry. the module dimensions ( x x Å) and the repeat arrangement in the crystal structure are inconsistent with the gmdp binding characterization, the activity displayed by cpl- truncated variants with one or two cw_ repeats, or the experimental determined hydrodynamic properties. using the small angle x-ray scattering (saxs) technique and the atsas computational platform ( ), a different arrangement of the cw_ repeats is envisaged in solution (fig. ) , whose rather opened structure ( x x Å) is consistent with the experimental data. additionally, employing the saxs-based structure and the honeycomb structure proposed for the peptidoglycan, a model, where each cw_ repeat of the cell wall binding module fit in adjacent glycan chains, has been derived. in , the protein structure initiative (psi) was started as to determine three-dimensional structures of proteins within every family. once solved, structures are deposited into the protein data bank (pdb) and termed structural genomics (sg) proteins. as of june , there are over , sg proteins deposited in the pdb and most of them are of unknown or uncertain biochemical function. in addition, many of these sg proteins have a putative functional assignment based on their sequence and structural similarities with proteins of known function; such comparisons can be made against large databases using programs such as blast or dali. however, these putative functional assignments are often incorrect. this project analyzes members of the crotonase superfamily (cs). the cs consists of five diverse functional subgroups that are well characterized structurally and functionally, representing different types of reactivity, including hydrolase, isomerase, hydratase, and dehalogenase activities. this superfamily also contains at least sg proteins, so it is ideal to test predictions of protein function. our approach is based on local structure matching at the computationally predicted active site. first, partial order optimum likelihood (pool) is used to predict the functionally important residues of each sg protein and of the proteins of known function in the superfamily. next, structurally aligned local sites of activity (salsa) is used to align the predicted catalytic residues of the well-characterized members in the superfamily. from this analysis we generate chemical signatures for each functional subgroup and compare them to the sets of catalytic residues predicted for the sg proteins. we demonstrate based on these computational methods that the majority of the putative annotations in the cs superfamily are likely incorrect. currently, biochemical assays are being used to test these predictions. preliminary biochemical results show that one sg protein, thermus thermophilus q sls _thet , classified as a probable enoyl-coa hydratase, possesses hydrolase activity as predicted by our methods. the outcomes of this project will be to successfully classify the biochemical functions of sg proteins based on their local structure at the predicted active sites and to provide a conceptual framework for the functional classification of the remaining sg proteins within the pdb. this work is supported by nsf-che- . directly observing the synergistic dynamics in f-actin and microtubule assembly jun zhang , deqiang wang the department of cell biology and genetics, key laboratory of molecular biology on infectious disease although important in cellular activities, little attention was paid to the synergistic effects of actin and microtubule cytoskeleton assembly. with the time-lapse atomic force microscope (tl-afm), we directly observed the large-scale dynamic structure of actin filaments formed in the presence or absence of microtubulin in solution. in absence of microtubulin, the g-actin could be polymerized into ordered filamentous structures with different diameter from the slimmest filament of single f-actin to giant filament in tree-like branched aggregates. the polymerized actin filaments, to which our most intense attention was attracted, was discretely arranged and showed obvious polymorphism in structures completely distinct from those in the presence of microtubulin. the supra-molecular complex structures of the latter were mainly composed of single f-actin and/or multifilaments clearly consisting of several single f-actin and regularly cross-linked with the assembled microtubular bundles. the experimental results demonstrated that the f-actin dynamics could be coordinated by microtubule assembly. further analyses implied that the interactions between f-actin and microtubule could prevent the emergence of structural polymorphism of f-actin alone, and give rise to organization of specific complex structures instead. it was suggested that dynamic synergy between the f-actin and microtubule would be implicated in living cells. the adaptor protein - - is found in a diverse range of pathologically relevant protein-protein interactions (ppis). as - - is a hub protein with very diverse interactions, it is able to influence the intracellular localization of their binding partners and they are key regulators of signal transduction processes as well as regulators of cell cycle functions.nevertheless, there are only few examples of - - acting extracellularly. one of the extracellular targets for - - is aminopeptidase n (apn). apn is an extracellular trans-membrane enzyme that acts as a receptor for - - . binding to apn, - - excreted by keratinocytes can upregulate the excretion of matrix metalloproteinase- (mmp ) in fibroblasts. mmp , by breaking down collagens, is key in the remodeling of the extracellular matrix. modulation of the - - /apn interaction thereby may play a crucial role in the fundamental understanding and ultimately treatment of wound healing, respiratory diseases and tumor growth. in the eukaryotic cell, the - - dimer operates as an adapter platform for binding partners. a wide range of classes of (small) molecules, natural products and peptides has been used to modulate the ppis, providing either stabilization or inhibition of the interactions of - - with its binding partner. binding partner fragments or peptides are known to bind to the - - binding groove via arecognition motif containing a phosphorylated serine or threonine. making use of the dimeric structure of - - , novel small-molecule inhibitors may be tethered to exploit the bivalent effect. from a large virtual screening and experimental validation, a scaffold containing a phenyl phosphonic moiety was identified, showing inhibitory properties for - - ppis. potent derivatives of this scaffold were bridged by polyethylene glycol (peg) linkers of varying lengths, thereby facilitating the compound to reach both binding sites of the - - dimer and concurrently increasing the compound's solubility in aqueous solution. similar bivalent inhibitors have been proven to synergistically increase their efficacy. biophysical evaluation by means of fluorescence polarization (fp) inhibition competition assays, revealed an increase of the half maximal inhibitory concentration (ic ) from approximately lm for the monomeric phenyl phosphonate to approximately . lm for the bivalent inhibitor with a Å linker. this demonstrates a -fold increase of inhibitory effect towards - - and its binding partner peptide mimic. extensive thermodynamic, kinetic and structural analysis of the interaction is in progress.phosphonic moieties have been shown to pass the cell membrane poorly, due to their highly charged character. by being able to specifically inhibit the extracellular interaction between - - and apn, these inhibitors are prevented from interfering with the extensive intracellular - - interactome. hence, these bivalent phenyl phosphonate inhibitors provide a promising strategy towards extracellular application. the mre complex is an oligomeric assembly comprising of dimmers of mre and rad proteins in archea and additionally nbs subunit present in eukaryote. it is the central player in the dna damage response -a functional network comprising dna damage sensing, signal transduction, cell cycle regulation and dna double strand breaks (dsbs) repair [ ] . recent structural studies revealed that rad hinge domain is rather a short kink in the coiled-coil region and adopts unusual dimerization mode by intermolecular coordination of zn(ii) and formation of so-called zinc hook domain [ ] . to date, very limited structural data on the zinc hook domain have been reported, the only known structure was resolved for rad homologue from hyperthermophilic archaeon -p. furiosus. unusual zn(ii) coordination mode in zinc hook domain raises question of how zinc hook domain assembles to form interprotein zinc binding site with sufficient stability to function at low intracellular free zn(ii) concentrations [ ] . our study on minimal zinc hook domain fragment ( aa) indicated low femtomolar affinity towards zn(ii) [ ] . extended zinc hook domain fragment ( aa) reveals even zeptomolar affinity. therefore, our main goal was to probe the thermodynamic and structural effects that are hidden in the small interprotein interface and are responsible for the dimerization of the large and critical protein machinery. probing of those effects was achieved by detailed biophysical characterizations (including potentiometry, nmr, hdx ms and cd spectroscopy) of protein fragments of zinc hook domains with a number of point mutations. we showed that extremely high stability of zinc hook domain from p. furiosus is achieved by the formation of hydrogen bond network in b-hairpins and interprotein hydrophobic core. eindhoven university of technology dna-based molecular circuits have become a very attractive tool in molecular imaging, synthetic biology, molecular diagnostics and biomolecular computing. the highly modular and predictable nature of watson-crick base pairing allows the construction of complex circuits using a limited set of logic gates and building blocks. however, the lack of generic approaches to interface dna-based molecular circuits with protein activity limits their application in biomedicine and molecular diagnostics. here we present a new, highly modular approach to control the activity of a reporter enzyme based on the dna-directed assembly and disassembly of a complex between tem -b-lactamase and its inhibitor protein blip. both proteins are conjugated to a unique oligonucleotide, allowing the assembly of the enzyme-inhibitor pair and inhibition of enzyme activity by the addition of a complementary template strand. addition of an oligonucleotide that is complementary to a loop sequence in the template results in the formation of a rigid dsdna spacer that disrupts the enzyme-inhibitor complex, restoring enzyme activity. using this noncovalent approach allowed easy tuning of the template and target sequences with only a single set of oligonucleotide-functionalized enzyme and inhibitor. to show the modularity of the system, a panel of different template sequences were selected. only in the presence of their complementary viral dna sequences restoration of enzyme activity was observed. in addition to this excellent specificity the system showed to by higly sensitive towards its target, since the presence of as little as fmol of target resulted in an observable increase in enzyme activity. the use of a stable and well-characterized enzyme-inhibitor pair, complemented by the modular design of our reversible dna-directed protein switch make it an attractive system to implement in dna-based molecular circuits. several studies demonstrated important roles of human carbonic anhydrases (hcas) in a variety of physiological and pathological processes. consequently, in recent years the catalytically active hca isoforms have become an interesting target for the design of inhibitors with biomedical applications [ ] . derivatized sulfonamides of type r-so nh represent the class of ca inhibitors (cais) mostly used and best characterized. the large number of crystallographic studies so far available on these molecules clarified the main factors responsible for the binding of the sulfonamide moiety to the ca active site. in particular, it has been highlighted that even though these molecules generally behave as very potent cais, they do not show selectivity for the different isoforms. indeed, the sulfonamide moiety plays a predominant role in the interaction with the enzyme, while any change in the nature of the r substituent has generally a rather marginal effect on the enzyme-inhibitor affinity. these characteristics make difficult the design of sulfonamide derivatives selective for the different ca isoforms. consequently, much efforts were dedicated in last years to the development of new inhibitors that, although presenting lower affinity for the ca active site, would be able to be more selective toward the different isoforms. carboxylic acids have been recently investigated as cais, showing that these molecules can adopt different binding modes to the enzyme active site. in particular, they can coordinate directly to the zinc ion or be anchored to the zinc-bound water molecule. however, the structural reasons responsible of this peculiar behavior have not been clarified yet. in a general research project aimed at providing insights into the binding mode of these molecules to cas, we have undertaken the characterization of two carboxylic acids, namely an ortho-substituted benzoic acid [ ] and a saccharine derivative, by means of kinetic, crystallographic and theoretical studies. exploring the mechanism of fibril formation using fluorescently labelled human lysozyme variants ana bernardo gancedo 'exploring the mechanism of fibril formation using fluorescently labelled human lysozyme variants' human lysozyme is a widely characterised protein whose mutational variants misfold into fibrils that are associated with systemic amyloidosis ( ) . although the process of aggregation for human lysozyme has been well studied, the details of early events within this process are not fully characterised. single molecule fluorescence microscopy has been used to determine the oligomeric distributions present in the aggregation process of a number of disease-related intrinsic disordered proteins (idps) ( ) . recent advances in site-specific labelling of human lysozyme ( ) have made this protein amenable to these single molecule fluorescence studies. we have introduced alexa-fluorophores into the i t variant of human lysozyme and have demonstrated that the process of in vitro fibril formation is not significantly altered. using these fluorophore-labelled proteins we can apply single molecule fluorescence to study the early aggregation events within this system, allowing us to compare protein aggregation in a globular protein and with the aggregation process of idp's. abstract protein structure, folding and function, while specific interactions with lipid molecules can also contribute towards the biological activity of some membrane proteins. improving understanding of the interactions has resulted in the development of artificial lipid systems that allow the bilayer properties to be rationally manipulated in vitro to control protein behaviour. the bacterial transporter lacy is a well known integral membrane protein from the major facilitor superfamily, responsible for the protondriven uptake of d-lactose in e. coli. with a high resolution structure available and considerable understanding of mechanistic detail, and with observed changes to both structure and function in different bilayer environments, lacy is a good model system for examining the behaviour of a major class of membrane proteins in these lipid systems. purified lacy has been reconstituted into liposomes and droplet interface bilayer systems of varying lipid composition and the effect on protein function and bilayer properties examined. targeting abeta oligomers by trehalose-conjugated peptides: a molecular dynamics study alzheimer's disease (ad) is currently one of the most common and devastating forms of dementia correlated with beta-amyloid peptide (abeta) accumulation in human brain tissue [ , ] . inhibiting abeta selfoligomerization in brain tissue remains one of the main strategies to prevent or treat this disorder. as a consequence, in recent years much efforts have been spent in the understanding of the amyloid fibril growth process and its modulation by putative drug molecules. an interesting class of compounds able to prevent abeta fibrillogenesis, is represented by beta-sheet-breaker (bsb) peptides [ ] . although these molecules are thought to recognize in a self-complementary manner the abeta hydrophobic core region, however their precise mechanism of interaction is still unclear. in this context, we have studied the structural basis underlying the inhibitory effect of abeta( - ) fibrillogenesis explicated by two promising trehaloseconjugated bsb peptides (ac-lpffd-th (thct) and th-succinyl-lpffd-nh (thnt)) [ ] using an all-atom molecular dynamics (md) approach [ , ] . the pentameric nmr structure [ ] of abeta has been used to model amyloid protofibril, and the two protofibril ends have been investigated as putative binding sites. our simulations suggest that the interaction with the two protofibril ends occurs through different binding modes. in particular, binding on the odd edge (chain a) is guided by a well defined hydrophobic cleft, which is common to both ligands (thct and thnt). moreover, targeting chain a entails a significant structure destabilization leading to a partial loss of b structure and is an energetically favoured process, as assessed by mm/pbsa calculations. a significant contribution of the trehalose moiety to complexes stabilities emerged from our results. the basic structural unit of chromatin is the nucleosome, which is composed of histone proteins forming a scaffold with about base pairs of dna wrapped around. chromatin compacts eukaryotic genomes and regulates gene activity, which is mediated in part by posttranslational modifications (ptms) on the n-terminal tails of the histones. uncovering the detailed relationship between histone tail modifications and gene activity is a major topic of biomedical sciences and general techniques for generating nucleosomes with defined modification patterns in large numbers would greatly facilitate such investigations. to this end we are establishing a chemical toolbox for designer chromatin with defined histone ptm patterns. a protein semysinthesis approach is used that bases on "ligation-ready nucleosomes" with truncated histone h that can be ligated with the corresponding synthetic histone tail. we resorted to sortase-mediated ligation as chemoselective ligation method. here we report our recent developments in establishing the envisioned chemical toolbox for designer chromatin. evaluating cation-pi and pi-pi interaction in proteins using various biophysical methods in proteins the aromatic residues phenylalanine (phe), tyrosine (tyr), and tryptophan (trp) can be involved in aromatic interactions known as cation-pi and pi-pi interactions (dougherty ). compared to other non covalent interactions in proteins, like h-bonds, dipole-dipole, or van der waals interactions, relatively little is known about the pi-pi and the cation-pi interactions. the strength of both aromatic interactions is dependent on the pi-electron density in the aromatic residues. a lowering of electron density can be created by introducing strong electron-withdrawing substituents like fluorine atoms in the aromatic ring (dougherty ). in this way a nearly isosteric change in the aromatic system results in a marked change in electron density. substitution with methyl groups is known to slightly increase the electron density. the response to low cellular oxygen levels in humans and other animals is induced by the hypoxia inducible transcription factors (hifs). these transcription factors are regulated by hypoxia inducible factor prolyl hydroxylases (phds), which act as 'oxygen sensors' by hydroxylating hifs, thus leading to the proteomic degradation of the transcription factors. over the last years, there have been multiple reports that describe additional phd substrates other than hifs. among them are the large subunit of rna pol ii, several transcription factors, and components of signalling pathways. validating these reports is of major medicinal relevance given that phd inhibitors are now in the late stage phase clinical trials. in order to investigate the selectivity of phds, the reported proteins have been tested as substrates for hydroxylation by mass spectrometry, and as binders or competitors of the phds. initial work on peptides that contain the putative hydroxylation sites has indicated that the phds are much more selective for their well-established substrate hif. however, in ongoing work these initial results are going to be validated on protein level by co-expressing phds with the reported substrates. additionally, peptides of reported substrates were screened for their ability to alter the kinetics of hif-hydroxylation by phd . an inhibitory effect of at least two different peptides on phd was observed, suggesting that there is an interaction between the prolyl hydroxylase and these peptides. in order to investigate the mode of binding and inhibition, nmr studies have been carried out and binding of the two inhibitory peptides on phd has been shown. altogether, these results indicate that, although phds might be more selective for hif as a substrate as it was initially thought, the enzymatic activity of the prolyl hydroxylases is possibly influenced by a number of other proteins that can directly bind to phds. non-natural aminoacids via the mio-enzyme toolkit alina filip , judith h bartha-v ari , gergely b an oczy , l aszl o poppe , csaba paizs , florin-dan irimie biocatalysis and biotransformation research group, department of chemistry, ubb, department of organic chemistry and technology an attractive enzymatic route to enantiomerically pure to the highly valuable a-or b-aromatic amino acids involves the use of aromatic ammonia lyases (als) and aminomutases (ams). all these enzymes have in common an auto-catalically formed -methylene- , -dihydroimidazole- -one (mio) electrophilic prosthetic group, and show high structural and sequence similarities. the recent advances in improving the functional properties of these enzymes increased both their biocatalytic and therapeutic applications. we aimed to create a library of recombinant mio-enzymes consisting of the pals and pams with large substrate promiscuity in order to provide access to various non-natural aminoacids through enzymatic ammonia addition and/or ammonia elimination reactions of the substrate library already available in our researchgroup. the developed complementary substrate and enzyme library would provide the mio-enzyme toolkit useful for the synthesis of nonnatural aminoacids. the synthetic gene of the enzymes (pcpal, rtpal, avpal, papam) were cloned into pet b_j expression vector using xhoi and bpu i cloning sites. the plasmid dna was transformed to several e.coli host strains (rosetta, bl , origami ) in order to optimize the expression yields. the enzymes containing an n-terminal his -tag were purified with affinity chromatography, followed by ion-exchange or/and size-exclusion chromatography, obtaining pure and homogenous proteins, in their tetrameric, presumably native fold. the enzyme activity and the kinetic parameters of the purified enzymes was determined towards the natural substrate l-phenylalanine, as well as towards novel bulkier aromatic substrates (heteroaryl alanines, styryl alanines, biphenylalanines). furthermore to enhance their biocatalytic applicability we covalently immobilized the enzymes to carboxylated single-walled carbon nanotubes (swcnt cooh) using linkers with different lengths, and tested the activity and recycling of the immobilized enzyme. antibodies that bind protein antigens are indispensable tools in biochemical research and modern medicine. utilizing a phage display selection strategy, we have obtained synthetic antigen binders (sabs), based on a fab fragment of igg, to a wide array of proteins as distinct as membrane proteins, structural proteins, scaffold proteins and nuclear targets. here we demonstrate the applicability of the sabs towards the native, full-length proteins in cells. we show that the generated sabs are able to pull-down endogenous proteins from mammalian cell extracts along with their natural binding partners. we developed a method of utilizing our high affinity and specificity binders as fluorescently labeled tools to visualize target proteins in their native environment in the cells without the need of secondary antibodies or blocking reagents. our system also includes a method of efficient delivery of generated antibodies to living cells, where they can perform their function. the sabs have been successfully used for altering biological processes in a controllable manner. in vitro evolution from pluripotent peptide libraries with natural neurotoxin scaffolds to target receptors, proteases and trophic factors tai kubo , mohammed naimuddin , seigo ono national institute of advanced industrial science and technology (aist) in vitro evolution from pluripotent peptide libraries with natural neurotoxin scaffolds to target receptors, proteases and trophic factors small molecule natural products are precious resources for drug discovery. during millions of years of evolution, natural products must have been exposed to various selection pressures and have been refined in structure and function to obtain the present features. in some peptide neurotoxins, however, the basic molecular scaffold mainly configured by disulfide (s-s) bridges and/or alpha/beta structures, is strictly conserved within each family even under the evolution pressure. on the other hand the loop regions, which are not heavily involved in scaffold formation, are highly diverged. this mode of molecular evolution named 'accelerated evolution', is reasonable to quickly adapt to the vigorous change of the environment. the evolutionally selected scaffold is compact harboring both rigidity and flexibility in nature, and it may support a topology appropriate for target recognition and selective interaction. inspired by the system, we designed random peptide libraries from the peptide neurotoxins of the accelerated evolution. a three-finger ( f) shaped snake neurotoxin consists of huge family evolved by accelerated gene evolution. we prepared a f-peptide library by introducing random sequences in each fingertip. another random peptide library with an ick (inhibitor cystine knot) motif was prepared based on a neurotoxin gtx - from spider; originally identified as a t-type ca channel modulator. each library was subjected to in-vitro evolution directed to specific target molecules. for the f-peptide library cdna display method was applied to select binders. when interleukin- (il- ) receptors were targeted, the selected f peptides showed binding affinities (kd nm) comparable to the native ligand il- . when trypsin was targeted, peptides with serine protease inhibitor activities similar to sti and bpti (ki nm) were isolated. specific binders to a trophic factor vegf were also generated from the f library. to target membrane proteins, we developed a unique in-vitro evolution system, and named it as the periss (intra periplasm secretion and selection) method. in the system, target membrane proteins are expressed in inner membrane of e. coli and peptides are secreted to the periplasmic space, in between the inner and outer membranes; and the space is served for interaction and selection. the periss method enabled us to identify a peptide specific to muscarinic receptor m subtype from the ick peptide library. in conclusion, it was proved that the library designed from the scaffold of peptide toxin, which evolved in the mode of accelerated gene evolution, has pluripotency in target recognition, interaction and even bioactivity. phenylalanine ammonia lyase from petroselinum cripsum (pcpal) belongs to the class of enzymes containing -methylideneimidazole- -one (mio) as a prostetic group and it is responsible for the conversion of l-phenylalanine into trans-cinnamic acid. this reaction is reversibile under high ammonia concentration. we analyzed several factors that can influence the enantioselective synthesis of nitrophenylalanine mediated by whole cells as well as purified mio-containing and mio-less pcpals. first we investigated the behaviour of the enzymes depending on the ammonia concentration. we also inspected the influence of the ph on the pcpal catalyzed biotransformations. based on our results, we concluded that variation of ammonia concentration and the ph leads to decrease of enantioselectivity, suggesting that pcpal is able to catalyze the formation of both l-and d-enantiomers of electron-deficient structures. all microbial cellulase appears to have a conserved 'sg' amino acid sequence at an identical position in the n-terminal domain. the properties of the n-terminal amino acid sequence were also predicted computationally. this analysis showed that n-terminal sequence of the enzyme is unstable. the nterminal sequence also showed potential cleavage sites by different proteases which may contribute to its instability. the secondary structure analysis showed that the n-terminal sequence has % of the a.a. sequence in extended strand and % in random coil conformation. the n-terminal sequence was also analyzed for potential phosphorylation sites. while no potential serine and threonine sites were predicted, two tyrosine phosphorylation sites were predicted in the n-terminal sequence. the n-terminal sequence was also examined for the presence of kinase specific phosphorylation sites. the results showed the presence of one potential site which may be phosphorylated by pkc at position of the n-terminal sequence. the analysis for the prediction of the presence of oglcnac sites revealed that two such sites may potentially be present in the sequence. we have also predicted the ligand binding site in the n-terminal sequence of the protein. protein arginine methylation catalyzed by protein arginine methyltransferases (prmts), is a pivotal protein post-translational modification involved in a growing number of physiological and pathological processes including signal transduction, proliferation, differentiation and malignancy. prmt accounts for the majority of protein arginine methyltransferase activity in mammalian cells and, in consistence, a large amount of cellular substrates have been identified. several studies have reported that the activity of prmt changes upon stimulation in various cellular processes. in mammalian cells, prmt exists in a high molecular weight complex. the interacting partners of prmt , such as antiproliferative proteins btg and btg , protein phosphatase a, the orphan receptor tr , and ccr -associated factor (hcaf ) are shown to play a role in modulating the methyltransferase activity and the substrate selectivity of prmt . due to the pivotal roles of prmt in physiological and pathological conditions, intensive efforts have been put on the search of small synthetic chemical molecules which can efficiently modulate the activity of prmt for the potential development of therapeutics. in light of this, the intracellular small molecules that either transmit extracellular stimulation or act as cofactor to dictate the activity of prmts in cells are still poorly understood. our study focused on examining how cellular ions might affect the activity of prmt and found that divalent and monovalent ions differentially modulated the catalytic activity of prmt toward different substrates. oligomerisation properties of light-dependent protochlorophyllide oxidoreductase prothoracicotropic hormone (ptth) is one of the most important neuropeptide regulators for insect molting and metamorphosis. however, preparation of its recombinant protein has hardly been successful, because it is a homodimer protein with very complicated disulfide-bond structure. for example, silkworm ptth has three intramolecular disulfide bonds in its -residue polypeptide chain, and the two chains are further linked by an additional intermolecular disulfide bond to form the homomeric dimer. although the recombinant silkworm ptth was previously expressed in escherichia coli, the product was obtained only in precipitation fractions, and refolding of the precipitated protein provided the active dimer ptth in very poor yield. under such reductive conditions as in cytosol of the e. coli cells, formation of the correct disulfide-bond arrangement must be difficult. alternatively, for the heterologous expression of the silkworm ptth, we employed brevibacillus choshinensis (formally referred to as bacillus brevis), which has achieved good results in expression of various disulfide-bond-containing proteins. in this study, the silkworm ptth was expressed in the brevibacillus cells with an additional his -tag sequence at the c-terminus, for easier detection and purification. first of all, since the brevibacillus bacteria are equipped with a secretory system of the expressed proteins, a secretory signal sequence to be attached before the silkworm ptth was carefully selected. among four candidates in a commerciallyavailable kit, a signal sequence derived from an intrinsic cell-wall protein mwp gave better results in expression levels of the protein. second, incubation time of the cells was optimized, because an oligomerization state of the secreted ptth in the cell culture medium changed with the time. in the medium, various ptth oligomers including a monomer and a dimer were initially observed, but higher oligomers became a major portion of the secreted product after longer incubation than h. incubation for - h may be suitable for obtaining the native dimer form of the silkworm ptth. to remove the undesired monomer and higher oligomers, which mostly retained free sulfhydryl groups, the secreted proteins were treated with maleimide-peg -biotin. in the purification using a ni -nta column, the dimer of the his -tagged silkworm ptth was eluted with an imidazole gradient, separately ahead of other biotinylated proteins, probably due to interaction of the peg spacer with the ni -nta groups of the resin. after the reversed-phase hplc purification, the final product showed a single band on the nonreductive sds-page, and it had adequate ecdysone-releasing activity from isolated silkworm prothoracic glands. the brevibacillus bacteria are most promising host cells for the heterologous production of the insect ptth. role of the disulfide bridges in the transmembrane region of the insect prothoracicotropichormone receptor, torso torso is an insect cellular-membrane protein, which was recently identified as a receptor for prothoracicotropic hormone (ptth). although ptth is one of the important regulatory molecules in insect molting and metamorphosis, activation mechanism of torso by the ligand has not been elucidated yet. in this study, an oligomerization manner of the silkworm torso was examined, using heterologous expression in drosophila s cultured cells, because torso is a single-polypeptide receptor tyrosine kinase (rtk), and activation of such rtks is often triggered by the ligand-induced receptor dimerization on the cellular membrane. when activated with silkworm ptth, dimerization of the silkworm torso in the s cells was observed, using a cross-linking reagent bs , and the subsequent receptor autophosphorylation and downstream erk phosphorylation were also detected. surprisingly, however, the torso dimerization was revealed to occur even without the ligand stimulation, while the autophosphorylation and the erk phosphorylation were held in response to the stimulation. when fractionated by non-reductive sds-page, the silkworm torso showed an obvious dimer band, in addition to a faint monomer band, both with and without the ptth simulation, even though the receptor was not treated with the cross-linking reagent. this indicates that the torso protein is expressed originally as a disulfide-bond-linked dimer. in addition, by examining oligomerization states of several truncation and substitution mutants, cysteine residues in the transmembrane region were found to participate in the intermolecular disulfide bridges, linking the two receptor molecules in the dimer. when all of the three cysteines in the transmembrane region were replaced by phenylalanines, the disulfide-bond-linked torso dimerization was not observed, but spontaneous, ligand-independent association of the torso molecules was detected using the crosslinker bs . this spontaneous dimerization caused the apparent torso autophosphorylation, but it could not induce the downstream erk phosphorylation. consequently, without the intermolecular disulfide bridges, torso loses its responsiveness to the ptth stimulation. in conclusion, the disulfide bridges in the transmembrane region may play a role to preserve suitable relative position between the two torso molecules, which could induce ligand-dependent autophosphorylation leading to activation of the downstream signaling pathways in the cells. the yeast enzyme neutral trehalase (nth , ec . . . ) from saccharomyces cerevisiae hydrolyses the non-reducing disaccharide trehalose which serves as an energy source and a universal stress protectant in many different organisms. enzymatic activity of nth is enhanced by the yeast - - protein (bmh and bmh ) binding in a phosphorylation-dependent manner. nth activity is also regulated by ca binding to the ef-hand-like motif containing domain of nth [ ] .the native tbe page and analytical ultracentrifugation show that nth forms very stable complexes with bmh and bmh [ ] . to study the structure of nth alone and its complex with the - - protein we used circular dichroism, h/d exchange coupled to mass spectrometry, chemical cross-linking [ ] and small angle x-ray scattering (saxs) [ ] . at the same time protein crystallography of nth alone and its complex with bmh is performed.the low resolution structure of pnth :bmh protein complex revealed that binding of bmh induces a rearrangement of the whole nth molecule and that the region containing the ef-hand motif forms a separate domain which interacts with both bmh and catalytic domain of nth . we proved that integrity of the ef-hand motif is crucial for the bmh mediated activation of nth and ca binding. our data suggest that the ef hand-like motif functions as the intermediary through which bmh modulates the function of the catalytic domain of nth . these structural changes probably enable the substrate entry into the enzyme active site [ ] . our study of - - protein complex with the fully active enzyme nth offers a unique structural view of nth activation enabling us to better understand the role of the - - proteins in regulation of other enzymes. the assembly of self-regulating synthetic biochemical pathways in vitro has great potential as alternative catalysts for the high-yield production of low value/high volume commodity chemicals from biomass. high yields of low-value/high volume compounds that are required for economic viability is particularly difficult via traditional in vivo metabolic engineering of microbes due to competing biochemical pathways and toxicity. we have developed an alternative approach, called synthetic biochemistry, where the glycolysis pathway of central metabolism is reconstituted in vitro with an anabolic pathway that can produce useful compounds at high yield. in the specific synthetic biochemistry system described, reducing equivalents, atp, and carbon from glycolysis are funneled through the anabolic mevalonate pathway to produce the monoterpene limonene from glucose. the successful implementation of the in vitro pathway required development of a molecular purge-valve consisting of an nad and nadp specific reductase (ie wild-type and mutant pyruvate dehydrogenase), and nadh oxidase, noxe, to maintain proper nadp /nadph cofactor balance while allowing continuous carbon flux. we find that the purge-valve concept is readily transportable to other nad(p)h generating steps in central metabolism and can be used to convert glucose to limonene at high yield. chitinases (ec . . . ) are enzymes that randomly hydrolyze b- , glycosidic bonds of chitin and produce n-acetylchitooligosaccharide ((glcnac)n) that has various physiological functions such as immunostimulatory activity. most of fish takes crustacean such as shrimp and crab as food. therefore, the fish has chitinase in the stomach to chemically disrupt the chitinous envelope of crustacean. four chitinase isozymes ( - kda), pachia [ ] and pachib [ ] , and ptchia and ptchib, [ ] were purified from the stomach of silver croaker pennahia argentatus and threeline grunt parapristipoma trilineatum, by ammonium sulfate fractionation and column chromatographies, respectively. all the chitinases were stable and showed activity in the acidic ph range (ph - ). pachia and ptchia preferentially degraded the second glycosidic bond from the non-reducing end of (glcnac)n and pachib and ptchib had a preference for the third glycosidic bond of those. all the chitinases showed different substrate specificity toward insoluble long substrates. moreover, chitinase cdnas (pachi- and pachi- ) encoding pachia and pachib, and cdnas (ptchi- and ptchi- ) encoding ptchia and ptchib were obtained by cdna cloning using the rt-pcr and race method. the deduced amino acid sequences of all the chitinase cdnas contained n-terminal signal peptide, gh family catalytic domain, linker region, and chitin-binding domain. phylogenetic tree analysis of vertebrate chitinase revealed that fish stomach chitinases form unique chitinase isozyme groups, acidic fish chitinase- (afcase- ) including pachia and ptchia, and acidic fish chitinase- (afcase- ) including pachib and ptchib, which was different from an acidic mammalian chitinase (amcase) group. [ , ] the previously reported purified fish stomach chitinases [ ] can also be classified into two chitinase isozyme groups, afcase- and afcase- , by the n-terminal amino acid sequence. this study suggested that fish have excellent chitin degrading enzymatic system in which two different chitinases isozyme groups, afcase- and afcase- , with different degradation patterns are expressed in the stomach. recently, the enzymes produced by psychrophilic organisms have gained huge interest especially in the studies of temperature adaptation of the protein. previously, a cold-adapted yeast, glaciozyma antarctica pi was isolated from a marine environment in antarctica and the yeast was known to produce lipolytic and proteolytic enzymes. a gene encoding a unique recombinant bifunctional enzyme (lippi ) with cold active lipase with protease activity was successfully expressed, purified and characterized. temperature profile of the bifunctional lippi enzyme showed that the lipase functions optimally at c whereas the protease was more active at c. ph profile showed that both lippi lipase and protease were active at near neutral condition. activity of lippi lipase and protease were also activated in the presence of cacl but its protease counterpart seemed to be more active in the presence of zncl . effect of surfactants showed lippi lipase was activated by tween and sls and in contrast, lippi protease was almost deactivated in all surfactants tested. the presence of organic solvents did not affect both the lipase and protease activities. the lipase was more stable at solvents with higher log p value whereas the protease was slightly activated at low log p value particularly with dimethylsulfonyl. inhibitor studies revealed that lippi lipase was partially inhibited with edta and pmsf whereby the lippi protease was inhibited by pepstatin, edta and pmsf. lippi enzyme was successfully crystallized via vapour diffusion method. crystal of lippi enzyme was diffracted via synchrotron radiation. the three-dimensional structure of cold-adapted pi provided insight into cold adaptation and better understanding of the structural properties of lippi enzyme. the bifunctional properties of the enzyme could be potential candidate for low temperature industrial application. conformation-specific antibodies as enhancers and inhibitors of phosphatase activity of dep malgorzata nocula-lugowska , mateusz lugowski , anthony a. kossiakoff the university of chicago dep- (cd /ptp-h) is a transmembrane receptor-like protein tyrosine phosphatase (ptp) that has been implicated in the density-dependent regulation of cell growth, differentiation and transformation. it counteracts protein kinases by dephosphorylating a number of their substrates as well as the kinases themselves, thus potentially controlling the specificity of signals. for example egfr, vegfr , met, pdgf b receptor have been shown to be dephosphorylated by this phosphatase. dep- has been shown to act as a tumor suppressor and it has been proposed as a molecular target in antiangiogenesis therapy. as a result, both enhancers and inhibitors of dep- activity have the potential of elucidating pathways responsible for abnormal cell behavior. we generated synthetic antibodies against intracellular catalytic domain of dep- that act as modulators of the enzyme's phosphatase activity. by applying a combination of selection pressures an array of antibodies has been raised from phage display libraries of fab fragments which are capable of either enhancing or inhibiting dep- activity. in phosphatase assays with catalytic domain of dep- the antibodies demonstrate non-competitive or mixed kinetics. the crystal structure of dep- -inhibitor complex shows that this antibody binds to the part of the protein that is distant from the active site and acts by locking the enzyme in the nonnatural catalytically inactive state by hindering the closure of the wpd loop which is crucial for the reaction to occur. by contrast, as judged from the crystal structure of a complex of dep- with the antibody that enhances its phosphatase activity, this antibody seems to act by stabilizing the naturally found active state of dep- with wpd loop in the closed conformation. the antibodies are also able to recognize dep- in cells, as they stain dep- in immunofluorescence experiments. to test the applicability of raised antibodies in cells the activator was additionally used to pull down full-length endogenous dep- after being delivered to live cells. inhibition and enhancement of dep- activity by locking the enzyme in conformations which are either natural or imposed by allosteric binding of antibodies seems to be a mechanism that can be utilized to modulate activity of other tyrosine phosphatases. investigating acinetobacter baumannii pathogenesis: crystal structure of wbjb epimerase from a polysaccharide biosynthesis cluster oxygen homeostasis is regulated by hypoxia inducible factor, a transcription factor. when the oxygen level becomes too low (hypoxia), hypoxia-inducible-factor (hif- a) activates the expression of over a hundred genes, associated with angiogenesis, erythropoiesis, vegf (vascular endothelial growth factor), cell migration, and energy metabolism etc. hif- a cellular level is highly dependent on oxygen concentration and regulated by oxygen sensor enzyme, hif prolyl hydroxylase (phd plant sulphite reductase (sir) forms an electron transfer complex with ferredoxin (fd) for the reductive conversion of sulphite to sulphide. although previous studies have highlighted electrostatic interactions between oppositely-charged residues of the two proteins, detailed thermoenergetics of the intermolecular interaction for the complexation remains unknown. we herein carried out isothermal calorimetry of fd:sir complex formation at various nacl concentrations. driving force plot constructed from calorimetry showed that the complex was thermodynamically stabilized by both enthalpy and entropy through favourable electrostatic and non-electrostatic interactions. increasing nacl concentrations weakened interprotein affinity and contribution of the negative enthalpy changes became decreased, while no such significant decrease was found in the contribution of positive entropy changes. furthermore, a negative heat capacity change obtained from the enthalpy changes at distinct temperature indicated a contribution of hydrophobic interactions. these findings suggested that both electrostatic and nonelectrostatic interprotein interactions were energetically important for the complex formation. fddependent sir activity assay revealed a bell shaped activity curve with a maximum under a certain nacl concentration, while the methyl viologen-dependent assay of sir exhibited a profile of saturating curve, suggesting that an optimized interprotein interaction is a crucial factor in control of fd-dependent-sir activity. a residue-based nmr measurement of n-labeled fd upon complex formation with sir revealed that charged and non-charged residues were differentially contributed in the complex formation depending on nacl concentrations. we proposed that non-electrostatic forces were also critical for forming the fd:sir complex, and an optimized complex conformation for maximum enzymatic activity was achievable by a delicate balance among non-covalent intermolecular forces. these results may be extended for understanding of complexation between redox proteins containing biased charge clusters. ornithine transcarbamylase has a spatially extended active site as computationally predicted lisa ngu , kevin ramos , nicholas delateur , penny beuning , mary jo ondrechen understanding how an enzyme catalyzes a reaction is a fundamental problem in protein science. biochemical experimentation has revealed catalytic mechanisms of many enzymes; however these studies have focused almost exclusively on amino acid residues in direct contact with the reacting substrate molecule(s). here we report on the computational prediction and experimental verification of the importance of distal residues in enzyme catalysis, using e. coli ornithine transcarbamylase as an example. partial order optimum likelihood (pool), developed at northeastern university, is a machine learning technique that only requires the tertiary structure of a protein to predict important catalytic residues, based on computed, residue-specific electrostatic and chemical properties. pool has been shown to predict accurately the catalytic residues and to discern between compact and spatially extended active sites. dynamic conformational changes during catalysis and strong electrostatic interactions give rise to significant coupling between remote residues and the canonical active site residues of an enzyme. this suggests that at least some enzyme active sites are spatially extended, with second-and third-shell residues playing significant roles in catalysis. in this project, we focus on ornithine transcarbamylase (otc), for which dynamic processes are believed to play a role in its catalytic mechanism. otc is reported to undergo induced-fit conformational changes upon binding carbamoyl phosphate, which affects the subsequent binding of ornithine. residues predicted by pool to be catalytically important include five in direct contact with the substrate, r , h , d , c and r . pool also predicted remote residues to form a spatially extended, triple-layer active site. guided by computational predictions and using site-directed mutagenesis and kinetics assays of asp , his , glu and arg variants, we show that these pool-predicted remote residues, located in the second and third layers, are important for catalysis. alternative energy is a major focus of current research efforts. biodiesel, a mixture of fatty acid alkyl esters, is one of the most versatile alternative fuels currently in use. this is due to the fact that it is similar to gasoline and compatible with diesel engines found throughout the existing global infrastructure. biodiesel precursor lipids are abundant in cultivated feedstock organisms such as algae and bacteria. however, the standard process for converting oil to biodiesel is heat-intensive and requires complete removal of water, reducing the overall net energy gained in its production. our work constitutes an attempt to explore enzymatic synthesis of biodiesel from lipids such as those derived from emerging fuel crops. previous literature describes fatty acid alkyl ester formation in human patients with mrsa staphylococcus aureus wound lesions. these esters are formed by partially characterized esterase activity from an unidentified source. we have identified two mrsa enzymes responsible for this activity by using a combination of size exclusion chromatography, gas chromatography-mass spectrometry, and mass spectrometric protein sequencing. these two highly similar enzymes in the glycerol ester hydrolase (geh) family of proteins catalyze the synthesis of fatty acid alkyl esters in aqueous conditions at or near room temperature. we have demonstrated that other non-staphylococcal lipases do not exhibit this behavior. we have expressed these staphylococcal esterases in e. coli, and shown via gas chromatography that the expressed proteins catalyze the formation of fatty acid alkyl esters. based on sequence similarity to homologous proteins that have already been crystallized, we have predicted a structure for these enzymes and have engineered mutant fusions with higher rates of catalysis. our design hypothesis is that increased avidity for substrate molecules will yield a higher substrate concentration in the vicinity to the enzyme. to increase substrate concentration we have designed and expressed one of the enzymes as a chimeric fusion with the drosophila melanogaster alcohol-binding protein lush. gc-ms determination of biodiesel production rate indicates that the chimeric fusion has a lower-order rate constant with respect to ethanol. in other words, the fusion enzyme is less dependent on substrate concentration and is a superior catalyst at low ethanol concentrations. this result indicates that the rationally designed modification of binding avidity constitutes a potential avenue for improving the ability of enzymes to catalyze reactions with low-concentration or low-solubility substrates. functional elements of a human antizyme essential for binding and inhibiting human ornithine decarboxylase proteases are ubiquitous enzymes that catalyze the hydrolysis of peptide bonds within protein substrates; they have served as key model enzymes for studying the molecular basis for catalytic power and specificity. protease substrate specificity is most often defined in terms of linear sequence motifs that flank the cleavage site; however, the natural substrates of proteases are proteins with -dimensional shapes and complex conformational dynamics that are not well represented by -dimensional sequence alone. these structural and dynamical properties can impact recognition and binding of substrates by proteases, as well as the efficiency of catalysis itself. in this study, we explore the importance of substrate structure and dynamics for proteolysis using as our model the cleavage of the kunitz-bpti family of canonical serine protease inhibitors by mesotrypsin. bovine pancreatic trypsin inhibitor (bpti), an archetypal serine protease inhibitor of the kunitz family, has a high affinity interaction with trypsin, yet its peptide bond hydrolysis is many orders of magnitude slower than other peptide substrates. mesotrypsin, a trypsin variant, has been shown to hydrolyze kunitz family inhibitors at accelerated rates; this is especially true of human kunitz domain inhibitors. amyloid precursor protein inhibitor (appi) and amyloid precursor like protein- (aplp ), two human kunitz domain family members, are hydrolyzed by mesotrypsin several hundred times faster than bpti. here, we present a new, unpublished crystal structure of a cleavage intermediate aplp bound to mesotrypsin, refined to . Å resolution, revealing a dramatic substrate conformational change we hypothesize to be required during cleavage of a kunitz domain. using this structure along with published structures of appi and bpti complexes, we have modeled acyl-enzyme intermediates of mesotrypsin, and we have carried out molecular dynamic simulations that explore the transition of the initially formed native-like acyl-enzyme through the conformational transformation that allows the progression of the hydrolysis reaction. we further identify a specific hydrogen bond, present in bpti but not appi, which forms a stabilizing feature of the bpti scaffold. using site directed mutagenesis, we probe the contribution of this bond to the proteolytic stability of bpti. collectively our data for these highly structured substrates show that proteolysis rates are limited by a necessary conformational change in the substrate as the reaction progresses. rigid substrates possessing stabilizing features that render them highly resistant to this conformational change are proteolyzed more slowly than more flexible substrates of similar structure. lpmos are copper metalloenzymes that carry out the oxidative cleavage of the b- , -glycosidic bond, generating new chain ends that can subsequently be processed by cellulases, boosting the cellulose degradation. lpmos have a b-sandwich conformation with a flat binding surface, allowing for the enzyme to bind to crystalline cellulose. the cu ion, required for activity, is located in a so-called "histidine brace", in which the n-terminal histidine is highly conserved. regioselectivity according to the carbon atom being oxidized, lpmo types are identified: type and type oxidizing at the c and the c respectively, type lpmos oxidizing both the c and the c adjacent to the glycosidic linkage. we were able to express a type- lpmo (phanerochaete chrysosporium gh d) and a type- lpmo (trichoderma reesei cel a) in p. pastoris. this has proven to be very challenging, as lpmo activity requires a perfect cleavage of the signal sequence. after activity assays on pasc, characteristic hpaec-pad traces were obtained which will serve as a reference for engineering experiments. enzyme engineering using the dm database, a structure based multiple sequence alignment tool, it is possible to identify residues specifically conserved in subsets of protein sequences. by defining a subset for each lpmo type, we were able to identify residues contributing to regioselectivity. these positions are now being rationally engineered in subsequent rounds of mutagenesis, using trcel a as a template. the effect of the mutations will be determined by analyzing the hpaec-pad trace released from pasc. the main goal is to investigate the possibility of deleting the c specificity in a type lpmo. folding topology determines substrate binding order in the ribokinase superfamily alejandra herrera-morand e , victor castro-fern andez , madrid, españa ribokinase superfamily comprises three enzyme families: the adp-dependent sugar kinases family, the atpdependent coenzyme kinases family and the atp-dependent sugar kinases family. in all these families there is a large domain composed by a rossmann motif but only the atp-dependent enzymes have a b-meander motif in the c-terminal end. interestingly, these enzymes display an ordered kinetic mechanism where the substrate that will be phosphorylated binds first to the enzyme. the adp-dependent enzymes present a topological re-ordering of the secondary structural elements which produces an equivalent tertiary structure, which can be thought as a non-circular permutation (ncp) of the bmeander region. these enzymes also display an ordered kinetic mechanism but with an inversed order being the nucleotide the first substrate to bind to the enzyme. as this b-meander region of the proteins constitutes almost entirely the nucleotide binding site, and given that the permutation is the major structural difference between adp and atp-dependent kinases, it could the responsible for the nucleotide specificity. to test this hypothesis we introduce, by permutation, an atp-dependent topology in the homologous adp-dependent glucokinase from t. litoralis (pergk). size exclusion chromatography and circular dichroism spectra show that both the wild type and the permutated enzyme eluted as monomers with similar hydrodynamic behavior, and have the same secondary structure content. kinetic assays employing atp or adp as substrate demonstrate that even in the presence of mm atp, the pergk enzyme is not able to carry out the phosphoryl transfer. to test if the ncp has an impact in the kinetic constants and substrate binding order we determine the kinetic mechanism through classical protocols, involving initial velocity studies, product inhibition and dead end inhibitors. the results demonstrate that the pergk enzyme presents an altered substrate binding order compared to the wild type enzyme, where glucose was the first substrate to bind to the enzyme and glucose- -p the last product to be released. also, ligand-induced conformational changes were determined in the crystal structures. the apo, the enzyme-glucose and enzyme-glucose-adpbs structures were determined at . Å, . Å and . Å resolutions, respectively. structure analysis reveals that glucose binding provokes major conformational changes in the pergk enzyme, whereas adp binding does not cause further changes in the conformation of the protein. the results show that although the permutation has no effect on the nucleotide preference it provokes a change in the substrate binding order that correlates well with that those observed in the crystal structures. also, they demonstrate that during the evolutionary history of the ribokinase superfamily folding topology dictates the substrate binding order (fondecyt ). background: human ceruloplasmin (cp) is a circulating copper-containing glycoprotein produced in the liver and first described as a component of alpha -globulin fraction of human plasma. cp belongs to the multicopper oxidase family and it is nowadays regarded as a "moonlighting" protein, because it changes its function according to substrate, localization and expression. cp plays a key role in copper transport and iron metabolism and it is also a potent inhibitor of leukocyte myeloperoxidase (mpo) (kd nm), a major source of oxidants in vivo. the protein is extremely susceptible to proteolysis. in fact, cp is a structural homolog of coagulation factors v and viii, that are physiological substrates of thrombin (fiia). interestingly, thrombin participates in both haemostatic and inflammatory responses: in some focus of inflammation, such as rheumatoid arthritis (ra), the high activity of fiia has been documented. it was demonstrated that fiia can promote the chemotaxis of neutrophils and monocytes and their adhesion to endothelial cells, to increase vascular permeability. all these effect are mediated by par- interaction, that are abundantly expressed in inflamed rheumatoid synovial tissues. aims: in this study the interaction of cp with thrombin was investigated to confirm the participation of fiia in "spontaneous" proteolytic degradation of cp. in fact, in vivo the integrity of cp is essential for its role in the transport or metabolism of copper. results: our results indicated that thrombin cleaves cp in vitro at arg-ser and lys-val bonds, generating a nicked species that retains the native-like fold and the ferroxidase activity of the intact protein, whereas the mpo inhibitory function of cp is abrogated. analysis of the synovial fluid of ra patients reveals that cp is proteolytically degraded to a variable extent, with a fragmentation pattern similar to that observed with fiia in vitro, and that proteolysis is blocked by hirudin, a highly potent and specific thrombin inhibitor. we demonstrate that fiia has intrinsic affinity for cp (kd - nm), independently of proteolysis, and inhibits cp ferroxidase activity (ki nm). mapping of thrombin binding sites with specific exosite-directed ligands (i.e. hirugen, fibrinogen gamma-peptide) and thrombin analogues having the exosites variably compromised (i.e. prothrombin, prethrombin- , alpha-thrombin), reveals that the positively charged exosite-ii of thrombin binds to the negative upper region of cp, while the protease active site and exosite-i remain accessible. these results suggest that thrombin can exacerbate inflammation in ra by impairing via proteolysis the mpo inhibitory function of cp and by competitively inhibiting cp ferroxidase activity. an artificial pathway for isobutene production by direct fermentation: combining metabolic engineering and protein engineering benoit villiers , franc¸ois stricher the purpose of global bioenergies is to develop innovative metabolic pathways for the production of light olefins from renewable resources, by direct fermentation. light olefins (ethylene, propylene, linear butylene, isobutene and butadiene) are the core of the petrochemical industry. however, microorganisms do not naturally produce light olefins and no bioprocess to convert renewable resources to these molecules has been industrialized so far. global bioenergies has developed an artificial metabolic pathway including all the necessary enzymatic reactions from feedstock to isobutene. the metabolic route leading to isobutene can be divided in three parts, the first one being the use of natural reactions occurring in the host microorganism. second, heterologous natural reactions were introduced into the same host microorganism. finally, in contrast with most former approaches, non-naturally occurring reactions as enzymatic key steps were used, for example the decarboxylation of hydroxyisovaleric acid into isobutene. such non-natural critical steps were made possible by taking advantages of the natural catalytic and substrate promiscuity of exogenous enzymes. candidate enzymes are then evolved using systematic, random and semi-rational approaches in successive rounds in order to reach the desired catalytic efficiency. since all these reactions are enzymatic, isobutene can be obtained by direct fermentation, e.g. a process wherein all the chemical transformations are carried on by the host microorganism. the scale-up of this process began in november in a pilot plant installed in pomacle-bazancourt, france, with an annual capacity of tons of oxidation-grade isobutene. importantly, production of a volatile compound such as isobutene (and other light olefins) by direct fermentation presents two major advantages: first, the product is spontaneously removed from the culture broth, which alleviates the limitations linked with titer issues. second, the purification process is considerably easier and cheaper since no energy consuming methods such as distillation or phase separation are necessary to purify the end product. for the first time, batches of industrially produced isobutene from renewable resources have been obtained in the first half of . this isobutene has been in turn converted into isooctane, an additive currently used to improve gasoline quality, which could also be used as a standalone fuel. a demonstration plant is planned in leuna, germany, with an annual capacity of tons of polymer-grade isobutene and ibn-one, a joint venture with cristal union ( th european beet processor), has been formed to build and operate the first plant in france converting renewable resources into isobutene. finally, while the isobutene process is progressing towards industrial scale, global bioenergies is also developing new artificial metabolic pathways enabling direct bio-production of butadiene and propylene. the development of a coupled enzyme assay to detect isochorismate pyruvate lyase activity protein folding is typically defined in terms of the spatial arrangement of structural elements, i.e. helices, sheets and loops. we have, however, been developing an alternative and complementary paradigm based on conserved hydropathic interaction networks within proteins. these networks can be viewed as environments comprised of a mixture of polar and hydrophobic interaction fields, and may be the most important factor driving protein folding. this concept applies even to the lowest structural level within a protein: the sidechain conformations (or rotamers). exhaustive statistical analysis of existing crystallographic structures of proteins showed rotameric preferences and led to the creation of rotamer libraries frequently used in multiple aspects of structural biology, e.g., crystallography of relatively low-resolution structures, homology modeling and biomolecular nmr. however, little is actually known about the forces and factors driving the preference or suitability of one rotamer over another. in our study, tyrosine was analyzed since its sidechain has a comprehensive set of hydropathic properties that made it ideal as a proof of concept residue. construction of d hydropathic interaction maps of tyrosine residues in our dataset, reveals the environment around each, in terms of hydrophobic (p-p stacking, etc.) and polar (hydrogen bonding, etc.) interactions. after partitioning the tyrosines into backbonedependent bins, a map similarity metric based on the correlation coefficient was applied to each mapmap pair to build matrices suitable for clustering. notably, the first bin representing tyrosines, reduced to unique hydropathic environments with most diversity arising from favorable hydrophobic interactions with many different residue partner types. polar interactions for tyrosine include ubiquitous hydrogen bonding with the phenolic oh and somewhat surprisingly a handful of unique environments for the tyrosine backbone. all but one of the environments are dominated by a single rotamer, the exception being an environment defined by a paucity of interactions with the tyrosine ring and as a consequence its rotamer is indeterminate. this is consistent with it being composed of mostly surface residues. each tyrosine residue attempts to fulfill its hydropathic valences and thus, structural water molecules are seen in a variety of roles throughout these environments. alanine was analyzed using the same protocol as well. having the smallest sidechain (and small hydropathic interaction maps), alanine allowed us to investigate a significantly larger database, permitting us to examine the correlation between hydropathic maps and various structural features. in conclusion, the analysis of hydropathic environments strongly suggests that the orientation of a residue in a three-dimensional structure is a direct consequence of its hydropathic environment, which leads us to propose a new paradigm, interaction homology, as a key factor in protein structure. it is not the surrounding residues that direct sidechain conformations, but rather the hydropathic "field" of the surrounding atoms. folding studies of independent domains of lysine, arginine, ornithine binding protein (lao) protein folding problem has been addressed from the past years until nowadays, however, we still can not explain how proteins acquire their native structure from their amino acid sequence. different approaches has been taken in order to study protein folding, for example, the comparative study of folding mechanism between homologues proteins with high identity of sequence and structure, and the study of independent regions within a single protein. previously in our laboratory, thermodynamic and kinetic folding properties of lysine, ornithine, arginine binding protein (lao), a amino acid periplasmic binding protein (pbp), composed by two rossmann fold domains (one continuous and the other discontinuous) attached by a hinge region, has been studied. even there is a functional research about binding characteristics of histidine binding protei ns (his j) domains of when expressed independently (chu, b. ); there are no folding studies in these conditions for this or another pbps. it should be noted that his j shares % of sequence identity and tertiary structure (rmsd Å) with lao. in order to know the folding effect of encoding different domains in the same poly peptidic chain, as well as its influence in function, we are studying the thermodynamic and kinetic characteristics of folding of independently expressed lobes of lao, and comparing with those of native protein. by now, we expressed and purified the discontinuous domain. circular dichroism (cd) and fluorescence intensity spectra show that this independent domain has primary and tertiary structure. thermal denaturation has a single cooperative transition, which indicates this domain is folded. thermodynamic analysis of temperature and urea-induced experiments suggest that lao's folding characteristics are not just the addition of those from independent domains. furthermore, folding and refolding kinetics suggest the presence of a burst phase intermediate. a hypothesis to reconcile the physical and chemical unfolding of proteins a comprehensive view of protein folding is crucial for understanding how misfolding can cause neurodegenerative diseases and cancer. when using physical or chemical perturbations, nmr spectroscopy is a powerful tool to reveal a shift in the native conformation toward local intermediates that act as seeds for misfolding. high pressure (hp) or urea is commonly used to disturb folding species. pressure favors the reversible unfolding of proteins by causing changes in the volumetric properties of the proteinsolvent system. however, no mechanistic model has fully elucidated the effects of urea on structure unfolding, even though protein-urea interactions are considered to be crucial. here, we provide nmr spectroscopy and d reconstructions from x-ray scattering to develop the "push-and-pull" hypothesis, which helps to explain the initial mechanism of chemical unfolding in light of the physical events triggered by hp. in studying mpnep from moniliophthora perniciosa, we tracked two cooperative units using hp-nmr as mpnep moved uphill in the energy landscape; this process contrasts with the overall structural unfolding that occurs upon reaching a threshold concentration of urea. at subdenaturing concentrations of urea, we were able to trap a state in which urea is preferentially bound to the protein (as determined by nmr intensities and chemical shifts); this state is still folded and not additionally exposed to solvent [fluorescence and small-angle x-ray scattering (saxs)]. this state has a higher susceptibility to pressure denaturation (lower p / and larger dvu); thus, urea and hp share concomitant effects of urea binding and pulling and water-inducing pushing, respectively. these observations explain the differences between the molecular mechanisms that control the physical and chemical unfolding of proteins, thus opening up new possibilities for the study of protein folding and providing an interpretation of the nature of cooperativity in the folding and unfolding processes. zinc: a promoter or inhibitor for iapp aggregation? feng ding , praveen nedumpully-govindan zinc ions have been found to play an important and yet complex role in human islet amyloid polypeptide (hiapp) aggregation, which is associated with b-cell death in type-ii diabetes (t d). both concentration-dependent promotion and inhibition of iapp aggregation by zinc ions have been observed in vitro. similarly, at the population level, both positive and negative correlations were reported between the activity of a b-cell specific zinc transporter and t d risk. zinc ions are able to bind a single histidine in hiapp and coordinate the formation of zinc-bound hiapp oligomers. we hypothesize that the relative zinc/hiapp concentration determines the population of zinc-bound hiapp oligomers with different molecular weights. we have applied molecular dynamics (md) simulations to systematically study the structure and dynamics of a range of zinc-coordinated hiapp oligomers, including monomers, dimers, trimers, tetramers, and hexamers. our computational results suggest that different zinc-bound oligomers have distinct aggregation propensities. high-molecular weight oligomers ( peptides) have higher aggregation propensity than zinc-free and zinc-bound hiapp monomers at mm concentration in silico. therefore, our results provide a molecular insight into the complex role of direct zinc binding on hiapp aggregation. at low zinc/hiapp stoichiometry, zinc binding promotes aggregation. as the stoichiometry increases and zinc ions bind to single hiapp peptides, the aggregation of hiapp is inhibited due to electrostatic repulsion between the charged zinc ions. our computational study sheds light on the complex role of zinc on hiapp aggregation and t d development. biomolecules function in the densely crowded and highly heterogeneous cell, which is filled up to a volume of % with macromolecules [ ] . often, artificial macromolecular crowding agents are used to mimic these conditions in vitro and the excluded volume theory is applied to explain the observed effects [ ] . however, recent studies emphasize the role of further contributions aside from a pure volume effect including enthalpic and solvent effects [ , ] . we study cosolute effects at high molecular and macromolecular concentrations via a thermodynamic analysis of the thermal unfolding of ubiquitin in the presence of different concentrations of cosolutes (glucose, dextran, polyethylene glycol, potassium chloride) [ ] . in contrast to the excluded volume theory, we observed enthalpic stabilization and entropic destabilization forces for all tested cosolutes. the enthalpic stabilization mechanism of ubiquitin in macromolecular polysaccharide solutions of dextran was thereby similar to the effects observed in monomeric glucose. further, it remains unclear how such cosolutes reflect the physicochemical properties of the complex cell environment as a characterization of the in-cell crowding effect is lacking. thus, we developed a fret-based macromolecular crowding sensor to study the crowding effect in living cells [ ] . the averaged conformation of the sensor is similar to dilute aqueous buffer and cell lysate. we find that the in-cell crowding effect is distributed heterogeneously and can change significantly upon osmotic stress. the presented method allows to systematically study in-cell crowding effects and understand them as a modulator of biomolecular function. the stability of biomolecules under co-solvent conditions is dependent on the nature of the co-solvent [ ] . this can alter a protein's properties and structural features through biomolecular interactions between its functional groups and the co-solvent molecules. ionic liquids (ils) represent a rather diverse class of co-solvents. the design flexibility of these molten salts is an attractive feature, allowing the properties of the il to be tuned to meet the requirements of different applications [ ] . particularly, the modulation of reaction pathways between folding states, offering possibilities to control irreversibility in non-native protein aggregation [ ] . this has led us to investigate the impact of ils as co-solvents with the well-known protein denaturant urea. urea is considered to be a non-ionic chaotrope disturbing considerable the grid of hydrogen bonds with the protein backbone. urea interacts preferentially with the protein surface, mainly apolar residues and that dispersion, rather than electrostatic interactions, is the main energetic contribution to explain the stabilization of the unfolded state of the protein and the irreversibility of the unfolding process in the presence of urea [ ] . a large body of multidomain protein folding work has been devoted to study monomeric proteins. how do multidomain multimeric protein fold, avoiding accumulation of stable intermediate is yet to be studied in detail. our present study is focussed on understanding the folding and assembly of the domains of a homodimeric l-aspraginase from a hyperthermophile pyrococcus furiosus (pfa). each monomer of pfa consists of distinct n-and c-terminal domains (npfa and cpfa, respectively), connected by a linker. the folding mechanism of each domain with respect to full length protein was studied by mutating one out of two tryptophans, one in each domain. domains were purified and studied individually to obtain parallel account of the folding of each domain in isolation. subunit assembly was studied by analytical size exclusion chromatography (sec), multiangle light scattering and functional activity. through far uv cd, intrinsic trp fluorescence and sec, we demonstrated that domain folding and subunit association were intimately linked in full length pfa. interestingly, en route to its folding there was complete absence of hydrophobic intermediates as probed by ans fluorescence. folding of npfa was highly cooperative and, it provides interacting surfaces for cpfa to fold and also facilitates subunit assembly. the folding cooperativity of isolated domains was very less compared to the folding cooperativity of their full length counterparts, as indicated by equilibrium m values. to our surprise, during ph induced denaturation, at ph and , the dimer dissociates into highly hydrophobic folded monomers which readily underwent amyloidogenesis. we showed that at such extreme conditions, cooperativity in folding process in multidomain multimeric protein is not solely governed by the folding of individual domains, rather by concomitant folding and association of domains directly into a quaternary structure. in other case, where subunit folding occurred prior to association, protein readily underwent extensive aggregation. groel assisted folding of multiple recombinant proteins simultaneously over-expressed in e.coli megha goyal , tapan kumar chaudhuri aggregation prone recombinant proteins very often form inclusion bodies and also exhibits poor yield of functional protein during in vitro refolding process from chemically denatured form. bacterial chaperonin groel provides folding assistance to several proteins, when over-expressed with one of the recombinant proteins. there are instances that groel in presence of few other co-expressed chaperones like dnaj, dnak etc provides better yield of folded protein during homologous and heterologous expression. considering the ongoing events in the cells, it is known that molecular chaperone groel assists in the folding of various proteins in the cytoplasm. hence attempt to fold multiple recombinant proteins over-expressing simultaneously with the co-expression of chaperones can be worth trying. this approach may cut down various complexities in the functional recombinant protein preparation, including time and effective cost. keeping this view in mind, folding of two simultaneously expressed aggregation prone proteins, kda e.coli maltodextrin glucosidase (malz) and kda yeast mitochondrial aconitase have been investigated with the co-expression of groel and groes in e.coli cytosol. it has been previously reported that both the chosen proteins undergo co-expressed groel-groes assisted folding in e.coli cytosol, when they over-express alone. in this study we have optimized the overexpression of malz and aconitase simultaneously in e.coli. further optimisation was carried out to coexpress groel along with malz and aconitase. based on the basic philosophy that soluble protein mainly contains folded fraction, the event of groel/es assisted folding of simultaneously overexpressed proteins, malz and aconitase was monitored through the attainment of soluble proteins under various sets of conditions such as temperature. the major outcome of the present study is that, with the groel-groes assistance, the yield of soluble proteins (malz and aconitase) together constitutes higher percentage of folded protein in contrast to the percent yield when a single protein was overexpressed. significance of this type of study relies on the fact that the cells can over-produce higher amount of recombinant proteins, when multiple over-expression takes place. not only pushing up cell's capability of over-expression, co-expression of groel and groes efficiently assists in the folding of multiple proteins simultaneously over-expressed in e.coli. amyloid fibrils associated with serious diseases including alzheimer's, parkinson's, and prion diseases promoted the challenge of studying protein misfolding, leading to the development of amyloid structural biology. amyloid fibrils form in supersaturated solutions via a nucleation and growth mechanism. although the structural features of amyloid fibrils have become increasingly clearer, knowledge on the thermodynamics of fibrillation is limited. furthermore, protein aggregation is not a target of calorimetry, one of the most powerful approaches used to study proteins. here, with b -microglobulin, a protein responsible for dialysis-related amyloidosis, we show direct heat measurements of the formation of amyloid fibrils using isothermal titration calorimetry (itc). the spontaneous fibrillation after a lag phase was accompanied by exothermic heat. the thermodynamic parameters of fibrillation obtained under various protein concentrations and temperatures were consistent with the main-chain dominated structural model of fibrils, in which overall packing was less than that of the native structures. we also characterized the thermodynamics of amorphous aggregation, enabling the comparison of protein folding, amyloid fibrillation, and amorphous aggregation. in order to obtain general thermodynamic properties of protein aggregations, we further investigated aggregation of glucagon and insulin, two of the most famous amyloidogenic peptide hormones, using itc. we also observed characteristic heat of spontaneous amyloid fibrillation of both proteins after a lag time. taken all together, we showed that thermodynamic studies on amyloid fibrillation and amorphous aggregation were indeed possible by means of itc-based qualitative and quantitative calorimetric analyses. itc will become a promising approach for clarifying the thermodynamic properties of protein aggregates. the more case studies are required toward the establishment of thermodynamics of protein misfolding and aggregation when hydrophobic proteins are, for any reason, exposed to the cytosol they are rapidly captured by protective complexes which shield them from the aqueous surroundings and decide their fate (by either targeting them to their correct membrane homes or marking them for degradation by the ubiquitin/proteasome system). the bag holdase is a heterotrimeric protein complex, comprising bag , ubl a and trc , which works closely with the cochaperone sgta to triage hydrophobic proteins and pass them along the appropriate pathway. sgta also interacts with viral proteins and hormone receptors and is upregulated in numerous cancer types. these functions require further investigation to determine the scope of sgta as a therapeutic target. our lab has solved the solution structure of the n-terminal dimerization domain of sgta and characterised its interaction with two different ubiquitin-like (ubl) domains in the bag holdase (one from ubl a and the other from bag itself) using nmr chemical shift perturbation data and other biophysical techniques including isothermal titration calorimetry and microscale thermophoresis. at this meeting i will report on the progress we have made in structurally characterising further key players that participate in this quality control, with the aim of clarifying the intricate network of molecular interactions that governs these processes in health and disease. ensemble, ribbon and electrostatics spacefill views of the sgta dimerization domain structure. the final panel shows the structure overlaid with its yeast homologue. alpha synuclein is a small protein ( kda) expressed at high levels in dopaminergic neurons. fibrillar aggregates of a-synuclein inside the dopaminergic neuron are the major components of lewy bodies and lewy neuritis inclusion, which are considered as potential hallmark of parkinson's disease (pd). both in vitro as well as in vivo studies suggest that the soluble, oligomeric forms of a-syn are the more potent neurotoxic species, responsible for neuronal injury and death in pd. therefore, molecules that inhibit the toxicity of oligomers either by reducing their formation or by converting their more toxic oligomeric state to less-toxic fibrillar state would be effective agents for the drug development against pd. curcumin is one of the asian food ingredients which has shown a potential role as therapeutic agent against many neurological disorders including pd. however, the instability and low solubility makes it less attractive for use as potential therapeutic agent. the present work focuses on screening of the compounds similar to curcumin but having better effects on the morphology and toxicity of oligomeric and fibrillar assemblies of a-syn, which could be used as therapeutic agent preferentially over the naturally occurring curcumin. we synthesized and analyzed the effects of nine compounds, which are structurally similar to curcumin, on different stages of a-syn amyloid aggregation. here, we showed that curcumin and its analogs accelerate a-syn aggregation to produce morphologically different amyloid fibrils in vitro. however, there is no significant effect of curcumin and its analogs on the secondary structure of preformed a-syn fibrils. furthermore, these curcumin analogs showed differential binding affinities with the preformed a-syn aggregates, possibly due to difference in their chemical structures. the present data suggest the promising role of curcumin analogs in the treatment of a-synucleinopathy disorders. in vitro folding mechanisms determine the forces applied during co-translational folding there is currently much debate as to whether experiments conducted in vitro describe the folding of proteins in vivo. in particular, it is often suggested that the co-translational folding of nascent protein chains is dominated by the presence of the ribosome and associated chaperones, and that folding mechanisms will be affected by the vectorial nature of translation. here we use an arrest peptide assay to investigate the co-translational folding of a number of all-a spectrin domains that exhibit a range of thermodynamic stabilities and in vitro folding rates. our unexpected finding is that that the force exerted on the ribosome by these domains is not related to either the thermodynamic stability of the domain, or to the folding (loading) rate, but rather to the in vitro folding mechanism. we infer that the in vitro folding mechanisms of these domains are unaffected by the presence of the ribosome -even when part of the nascent chain is retained within the ribosome exit tunnel. there has been much work to date investigating the intermediates present in stalled translation complexes -but now, for the first time, we can begin to directly explore the rate limiting transition state in the co-translational folding of homologous proteins. can the structure of a protein (h . ) depend on the treatment of a solvent medium (explicit vs effective) in a coarse-grained computer simulation? ras pandey , barry farmer university of southern mississippi, air force research laboratory solvent medium plays a critical role in orchestrating the structure and dynamics of a protein. in computer simulation modeling of protein structure in a solvent medium, explicit, implicit, effectivemedium, approaches are often adopted to incorporate the effects of solvation. because of the complexity in incorporating all atomic and molecular details, the multiple components, reaching the large-scale, etc. implicit solvent or effective medium approach is generally more viable than the explicit solvent methods. some of the pertinent characteristics such as excluded volume of the solvent constituents, its concentration, and the underlying fluctuations which may be important in probing some issues are generally ignored in effective medium or implicit solvent approaches. using a coarse-grained approach, we investigate the structure and dynamics of a protein (a histone, h . ) in the presence of both effective as well as explicit solvent media over a range of temperatures with the monte carlo simulations. the protein is represented by a coarse-grained chain of residues whose interactions are described by knowledge-based residue-residue and hydropathy-index-based residuesolvent interactions. in effective medium approach, each empty lattice site around the protein structure acts as a solvent. only a fraction of lattice sites are occupied by mobile solvent constituents along with the protein chain in explicit solvent medium. large scale simulations are performed to analyze the structure of the protein for a range of residue-solvent interactions and temperature in both explicit and effective solvent media. we study a number of local (e.g. solvation and mobility profiles) and global (radius of gyration and structure factor) physical quantities as a function of temperature. we find that the response of the radius of gyration of the protein in explicit solvent is different from that in effective medium solvent. thus, the presence of fluctuations in explicit solvent approach have considerable effects on the structure and dynamics of protein h . . differences due to type of solvent on the response of some of these quantities as a function of temperature as well as general similarities will be presented. single-molecule vectorial folding and unfolding through membrane pores david protein folding and unfolding in vivo is frequently vectorial. for example, proteins are synthesized at the ribosome and emerge n-terminal first. as the polypeptide chain emerges from a nm wide pore is free to fold, interact with partners or misfold . in another example, proteins are unfolded at the proteasome by pulling from either the n or c terminus against a - nm wide pore, applying a tension on the residues surrounding the terminus of the protein . under this conditions, proteins may behave differently than when unfolded/refolded with temperature or urea. this may have important implications, as protein folding and unfolding in vivo is related to both function and disease. we noticed that vectorial folding is inherently linked to nanometer size pores. making use of nanopore technology we developed a method to monitor protein unfolding during membrane translocation at the single-molecule level . briefly, an oligonucleotide attached at either end of a protein threads a single protein nanopore inserted in a lipid membrane. in response to an applied membrane potential, the oligonucleotide pulls the protein through the pore and as it is forced to translocate it unfolds. analysing the ionic current we obtain the unfolding pathway and information on the polypeptide sequence. this methodology has shown that proteins unfold with different kinetics when pulled from one terminus or the other . remarkably, it is also possible to say whether the protein has been phosphorylated or not, and where . we have recently advanced our model system to study protein folding after translocation at the singlemolecule level . a single-protein molecule was translocated through a pore and forced to translocate back at predetermined times. we measured the stability of the refolded state at different times and we obtained the vectorial folding pathway of the protein. further, we observed that the protein was capable of co-translocational folding and that this premature folding contributed to the complete translocation of the protein. our results show that nanopore technology applied to proteins can be used to describe the vectorial folding and unfolding of proteins, providing insight to how these processes may work in vivo. further, single-molecule protein sequencing is a possibility that could revolutionise our knowledge on biological processes. thermodynamics studies of oligomeric proteins, which are the dominant protein natural form, have been often hampered because irreversible aggregation and/or slow reactions are common. there is not a single report on the reversible equilibrium thermal unfolding of proteins composed by (b/a) barrel subunits, albeit this "tim barrel" topology is one of the most abundant and versatile in nature. the eponymous tim barrel, triosephosphate isomerase (tim) is a ubiquitous glycolytic enzyme that catalyzes the isomerization of glyceraldehyde- -phosphate and dihydroxyacetone phosphate. the unfolding of several tims, mainly of eukaryotic organisms, has been extensively studied. regarding thermal unfolding, eighteen tims, mainly from eukaryotes, as diverse as amoebozoa, euglenozoa, ascomycota and chordata, have been studied. even though a full thermodynamic characterization has been hampered by irreversible aggregation and/or the presence of hysteresis in all of them, the activation parameters that describe the kinetic control of five eukaryotic tims have been reported. we characterized the structure, catalytic properties, association state and temperature-induced unfolding of the eponymous tim barrel, triosephosphate isomerase (tim), belonging to five species representative of different bacterial taxa: deinococcus radiodurans (drtim), nostoc punctiforme (nptim), gemmata obscuriglobus (gotim), clostridium perfringens (cptim) and streptomyces coelicolor (sctim). irreversibility and kinetic control were observed in the thermal unfolding of nptim and gotim, while for drtim, sctim and cptim, the thermal unfolding was found to follow a two-state equilibrium reversible process, a behavior not observed previously for others tims. shifts in the global stability curves of these three proteins are related to organismal temperature range of optimal growth and modulated by variations in maximum stability temperature and in the enthalpy change at that temperature. reversibility appears to correlate with low isoelectric point, the absence of residual structure in the unfolded state, small cavity volume in the native state structure, low conformational stability and a low melting temperature. furthermore, the strong coupling between dimer dissociation and monomer unfolding may reduce the possibility of aggregation and favor reversibility. it appears that there is a delicate balance between several contributions whose concerted interplay is necessary to achieve thermal reversibility in oligomeric enzymes. furthermore, the finding that the three reversible proteins come from organisms from different phyla suggests that unfolding reversibility may be more common than what is currently known supported by a critical step in the late phase of human immunodeficiency virus type (hiv- ) infection is targeting of the virally encoded gag proteins to the plasma membrane (pm) for assembly. prior to assembly, the hiv- gag polyprotein adopts a compact "folded over" conformation and exists in the monomeric or low-order oligomeric states. whereas it is established that the nucleocapsid domain of gag specifically recognizes motifs in the viral rna genome for packaging, there is compelling evidence that the myristoylated matrix (ma) domain also binds to cellular rna to prevent premature gag targeting to intracellular membranes. upon transport of gag to the pm, the interaction of ma with rna is exchanged for an interaction of ma with pm components. this molecular switch induces an extended conformation of gag, leading to formation of high-order gag oligomers on the pm. because gag is anchored and therefore captured by its interaction with the available phospholipids, the intracellular targeting of gag is likely to be determined by the relative strength of its interaction with the dominant lipids composing each membrane subcompartment. the key to understanding this essential molecular switch is elucidating at the molecular level the interaction of ma with specific pm components. for over two decades, biochemical, in vivo, in vitro and genetic studies have focused on factors that modulate binding of retroviral gag proteins to membranes but only recently the structural and molecular determinants of gag assembly have begun to emerge. in addition to the electrostatic interactions between a highly conserved basic region of ma and acidic phospholipids, it is now believed that the hydrophobicity of the membrane interior represented by the acyl chains and cholesterol also play important roles. we employ nmr methods to elucidate the molecular determinants of gag binding to the membrane. our structural studies revealed that phosphatidylinositol- , -bisphosphate (pi ( the production of functionally antibodies depends on the transition of immature b cells to mature plasma cells and is tightly linked to several "quality control" check points. during b cell development, the pre-b cell receptor (pre-bcr) is the first checkpoint which determines the viability and proliferation of the pre-b cell. the pre-bcr is composed of an immunoglobulin (ig) heavy chain molecule associated with an ig light chain-like molecule called the surrogate light chain (slc). the slc is composed by two proteins k and vpreb which possess a unique region at the n-or c-terminus, respectively. vpreb lacks a b-strand which is provided by the k protein allowing the non-covalent interaction essential for formation of the slc heterodimer. our understandings of the molecular mechanism of slc function and assembly are still at an early stage. in particular, we do not know how the slc associates and forms the pre-bcr for the selection of all heavy chains (hcs). our study focuses on dissecting the "fab fragment" of the pre-bcr to study the effect of the unexpected structural features of the slc to gain insight in hc selection. the analysis of the assembly of the slc revealed a significant difference between the single domains and the complexes in terms of stability and assembly. the folding behavior of the ch domain in the presence of the slc is key for the first quality control mechanism in the endoplasmic reticulum (er) prior to surface expression. our results show that the slc interacts with ch domain in a similar manner to the cl domain. thus, the folding of the naturally disordered ch domain upon interaction with the slc releases the hc retention in the er by bip. taken together, our study provides new insights into the folding and assembly of the "fab fragment" of the pre-bcr and paves the way for a detailed mechanistic understanding of hcs selection by the unique slc. though the - (sfgailss) region of human islet amyloid polypeptide (hiapp) has long been known to be crucial for amyloid fiber formation, lack of b-ordering of this region in structures of the final fiber as determined by both nma and x-ray has been puzzling. new evidence now suggests that the fgail region forms ordered b structures only in early intermediates. we present new dir studies on the fgail region of hiapp, with uniformly c o labeled amides, along with spectral and kinetic modelling. evolution of the peak frequency and d lineshape of the labeled region clearly present a transition from random coil to a stable b sheet, a conclusion which is substantiated by simulation of the d ir spectra. as determined from kinetic modeling, the fgail b-sheet creates a free energy barrier that is the cause of the lag phase during aggregation. these findings help to rationalize a broad range of previous fragment and mutation studies as well as provide a mechanism for fiber formation that has self-consistent kinetics and structures. the temperature dependence of protein stability in living cells studies addressing the consequence of crowding that exist in the interior of cells have reached an interesting stage. experimental data so far, predominantly from, small to medium sized proteins are indicating that, in general, natively folded proteins including, intrinsically disordered, gain structure and stability under conditions mimicking cell interior. however, on the other hand, a few studies on small proteins indicate destabilization of the native state. in very few instances, crowding resulted in compaction and aggregation of the unfolded and partially folded states. experimental data on the consequences of cell-like crowding situation on relatively large proteins with complex folding free energy landscape are absent. alpha subunit of tryptophan synthase, a kda tim barrel protein, provides a unique opportunity to address the consequence of crowding on the structure and stability of the native state and also on a partially folded state stable equilibrium intermediate populated in its (un)folding reactions. in the presence of increasing amounts the most commonly used crowding agent, ficoll- , a non-monotonous increase in the far uv-cd is observed for the native state. a steady increase up to mg/ml ficoll followed by a decrease in far-uv cd region is observed, indicating loss of structure at increased concentrations of the crowding agent. h- n hsqc nmr and fluorescence (fl) spectra confirm the of loss of structure at higher concentrations of ficoll- . loss of native base line in the urea induced unfolding reaction monitored by cd and fl clearly confirms the destabilization of the native state. similar to the structural changes observed for the native state, for the equilibrium intermediate state maximally populated at m urea also, non-monotonous changes in the far uv cd and fluorescence spectra are observed. the highly populated equilibrium intermediate shows an initial steady increase in the far uv cd signal followed by a sudden decrease. our results suggest that the structure of both native and partially folded states may be affected under crowding conditions. alpha- antitrypsin (aat) is a -kda serine protein inhibitor (serpin), which acts as an inhibitor of neutrophil elastase within the lungs. during inhibition, the protein undergoes a dramatic conformational change in which its exposed reactive centre loop (rcl) is cleaved and inserts into the central a-sheet as an extra beta-strand. this highly dynamic protein is also susceptible to mutations, resulting in misfolding and the accumulation of ordered polymers as intracellular inclusions within the endoplasmic reticulum of hepatocytes, where aat is synthesized. despite much knowledge of the folding and misfolding properties of aat as an isolated protein, very little is understood of how aat acquires its structure during biosynthesis. like all proteins, the biosynthesis of aat takes place on the ribosome, and protein folding occurs in a co-translational manner as the nascent polypeptide chain emerges from the ribosome's exit tunnel. this study aims to develop the biochemical and nmr structural strategies to characterize the co-translational folding characteristics of aat as it is being synthesized on the ribosome. for these studies, we have designed a series of secm-stalled ribosome nascent chain complexes (rnc) of aat of different lengths, which mimics the "snapshots" of the protein synthesis, capturing the folding process of the nascent chain during its emergence from the ribosome. using this library, we have recently developed a strategy to produce large quantities of the rncs both in vitro and in vivo within e. coli, a prerequisite for detailed biochemical and structural studies. using the aat-rncs, we are developing a suite of biochemical strategies to probe the capacity for aat nascent chains to adopt native structure on the ribosome. we have combined protease inhibition assays, western blot and native-page analysis to demonstrate that aat can fold while bound to the ribosome. in addition, we have employed a cysteine-based modification "pegylation" assay to probe lowresolution structural information of aat-rnc and this will guide our structural studies by nmr spectroscopy to provide a detailed understanding of aat folding on the ribosome at high resolution. thermodynamic properties of proteins vary with the environmental solvent condition (temperature, ions, ph, denaturants, etc.). although the effect of each environmental factor on proteins has been well studied, the complex effect of more than two environmental factors was not studied thoroughly. in this study, we investigate the simultaneous effect of urea denaturation (disruption of non-covalent bonds in proteins) and acid denaturation (titration of protein residues) on the nature of the folding transition for cyu protein. we performed the molecular dynamics simulations of bbl (pdb code: cyu) protein in various urea concentration at k. we calculated ph-dependent free energy landscape using the extended munoz-eaton model and described the phase diagram for the folding transition of bbl at various ph value and urea concentration. we mapped out the phase diagram of the folding transition of cyu, which clarifies the condition with which it undergoes the cooperative folding transition or the barrierless folding transition. biophysical analysis of partially folded states of myoglobin in presence of , , -trifluoroethanol paurnima talele , nand kishore the protein folding process involves one or more distinct populated intermediates. one such partially folded structure of particular importance observed during protein folding pathway is molten globule state. the properties of a molten globule state are intermediate between those of native and unfolded protein molecules. the importance of studying equilibrium molten globule is in its greater stability and flexible structure which has been shown to bind a variety of substrates and play a definite role in certain human diseases via aggregation, misfolding or some other mechanism. a protein must assume a stable and precisely ordered conformation to perform its biological function properly. the stability of a protein under specific conditions depends on its interactions with the solvent environment. therefore it is essential to understand protein folding intermediates, protein solvent interactions and protein stabilization. we have made attempts to thoroughly investigate the formation of stable molten globule state of the protein induced by alcohol using combination of calorimetric and spectroscopic techniques. the presentation will cover the topic on biophysical studies on partially folded states of myoglobin in presence of , , -trifluoroethanol. the thermal denaturation of myoglobin was studied in the presence of , , -trifluoroethanol (tfe) at various ph values using differential scanning calorimetry and uv-visible spectroscopy. the most obvious effect of tfe was lowering of the transition temperature with increasing concentration of tfe up to . mol•dm- , beyond which no thermal transitions were observed. the conformation of the protein was analyzed by a combination of fluorescence and circular dichroism measurements. at ph . and . , partially folded states of myoglobin were confirmed by cd spectroscopy. quantitative binding of ans to the tfe induced molten globule state of myoglobin was studied by using isothermal titration calorimetry (itc). the results enable quantitative estimation of the binding strength of ans with the molten globule state of myoglobin along with the enthalpic and entropic contributions to the binding process. the results also suggest occurrence of common structural features of the molten globule states of proteins offering two types of binding sites to ans molecules which has been widely used as a fluorescence probe to characterize partially folded states of proteins. modules. each cbr comprises a b-hairpin core followed by a short linker sequence. choline molecules are bound between two consecutive repeats through hydrophobic and cation-p interactions with aromatic side chains. apart from its biotechnological applications as an affinity tag for protein immobilization and purification, clyta is useful as a model for understanding the folding and stability of repeat proteins. in this sense, we proposed to get minimal peptides encompassing the sequence of a single cbr or even only its b-hairpin core able to maintain the native fold and the ability to bind choline. to that end, we first proceeded to analyze the peptide comprising the third b-hairpin core, denoted as clyt . based on cd and nmr data we demonstrate that the peptide clyt conserves its native bhairpin structure in aqueous solution, but forms a stable, amphipathic a-helix in detergent micelles and as well as in small lipid vesicles [ ] . considering the great differences in the distribution of hydrophobic and polar side chains shown by clyt b-hairpin and a-helix, we propose that amphipathic structures are stabilized in micelles or lipid vesicles. this "dual" behavior is the only up-to-now reported case of a micelle-induced conformational transition between two ordered peptide structures. to check whether other cbr repeats also undertake b-hairpin to a-helix transition in the presence of micelles, so that it represents a general tendency ascribed to all pneumococcal choline-binding modules, we will show new experimental evidences based on cd and nmr structural studies on peptides derived from the bhairpin cores of other clyta repeats, as well as in modified clyt peptides. continuing our studies of the effect of like-charged residues on protein-folding mechanisms, in this work, we investigated, by means of nmr spectroscopy and molecular-dynamics simulations, two short fragments of the human pin ww domain [hpin ( - ); hpin ( - )] and one single point mutation system derived from hpin ( - ) in which the original charged residues were replaced with non-polar alanine residues. results, for both original peptide fragments of hpin demonstrate the presence of ensembles of structures with a tendency to form a b-chain reversal. understanding the biology of huntington's disease via the pathogenic huntingtin monomer huntington's disease (hd) is caused by an abnormal extension of the polyglutamine (polyq) region within exon of the protein huntingtin from typically glutamines to over . disease onset correlates with the huntingtin misfolding and causing the formation of aggregates, however recent studies have postulated that pathogenic huntingtin monomer may form compact structures that are responsible for neuronal toxicity in hd. we sought to examine the conformation of huntingtin monomers, how polyq sequence length affects monomer structure and which protein-binding partners in the cell may exert a gain-of-toxic mechanism in pathology. hydrogen-deuterium exchange mass spectrometry was used to measure the degree of structure in both non-pathogenic ( q) and pathogenic ( q) huntingtin, with results showing that both forms exchanged % of potential nh hydrogen bond donors within seconds (n ), with little to no further exchange over the following ten minutes. this result suggested that the pathogenic conformations are not stabilized by slow exchanging hydrogen bonds. binding partners to the monomer were assessed in neuro a cell culture by immunoprecipitation and quantitative ms/ ms proteomics approaches after depletion of aggregates by pelleting. proteins that more prevalently co-precipitated with pathogenic huntingtin included fused in sarcoma (fus), glycine-trna ligase (gars), peroxiredoxin (prdx ), phosphatidylethanolamine-binding protein (pebp /rkip), and histone subunit hist h a, all of which were significantly enriched by two-fold or greater. rna-seq analysis indicated that none of these proteins had altered expression levels, suggesting that the binding interactions are not due to changes in background abundance. overall we found that the conformational differences are subtle, yet are sufficient to generate several specific proteome interactions that offer clues to a toxic gain-of-function mechanism in pathology. work is ongoing to probe the more subtle changes in conformation and the importance of these interactors to mediating mechanisms of dysfunction. hereditary tyrosinemia type i is an autosomal recesive disorder caused by deficiency of fumarylacetoacetate hydrolase (fah) enzyme. deficiency of fah leads to cellular accumulation of toxic metabolites which include mainly, succinylacetone (sa), maleylacetoacetate (maa) and fumarylacetoacetate (faa) in many body tissues. fah is mainly expressed in hepatocytes and renal proximal tubular epithelium. therefore, liver and kidney are the two primary organs affected by this disorder, and development of hepatocellular carcinoma is the major symptom. missense mutations leads to a loss of enzymatic efficiency which, in a high number of mutations, correlates with loss of kinetic and thermodynamic stability of the enzyme. in our ongoing project, we are trying to elucidate the molecular basis of tyrosinemia by means of biophisical and structural characterization of fah wild type along with its mutations. this knowledge should help us design new therapies based on the identification of pharmacological chaperones that could restore the altered enzymatic stability of the enzyme. human fah wild type and selected mutants were synthesized and inserted in an expression vector for e. coli. the proteins were purified in a fplc and, their thermodynamic and kinetic stability investigated using circular dichroism. our preliminary results confirm the loss of termodinamic stability of different mutants and its variability compared to wild type protein. repulsion between net charges of subunits during ferritin assembly daisuke sato , hideaki ohtomo , atsushi kurobe , satsuki takebe , yoshiteru yamada , kazuo fujiwara , masamichi ikeguchi department of bioinformatics, graduate school of engineering, soka university, jasri/spring- the organisms have a lot of spherical shell-shaped supermolecules consisting of identical or distinct subunits (e.g., ferritin, virus capsid, lumazine synthase and encapsulin). such multimeric proteins spontaneously assemble into their native structures from the subunits to acquire the specific functions. however, the assembly mechanism of such supermolecules has not been understood in detail. hence, to investigate the assembly mechanism is biologically important. escherichia coli non-heme ferritin (ftn) consists of identical subunits, which are assembled into a spherical shell-shape with / / symmetry. ftn is able to store iron inside cavity. the subunit includes a-d helices forming -helix bundle, a long bc-loop between b and c-helices and a short e-helix at the c-terminal. ftn dissociates into dimers at acidic ph. the dimer was shown to maintain the native-like secondary and tertiary structures by circular dichroism spectra and small angle x-ray scattering (saxs). the acid-dissociated ftn is able to reassemble into the native structure when ph increases. to clarify ftn assembly mechanism, we performed the stopped-flow time-resolved saxs (tr-saxs) experiments. the saxs profiles could be acquired every ms after the initiation of reassembly. the initial velocity calculated from the forward scattering intensity increment was proportional to the square of the protein concentration, implying that the reaction is second-order. we propose the sequential bimolecular reaction, in which two dimers bind to form tetramer, then another dimer attaches to the tetramer to form a hexamer, and so on. the assembly rate depended on ph and ion strength, indicating that the electrostatic interaction plays an important role in the assembly reaction. the assembly rate decreased with increasing ph in the range from . to . and increased with increasing nacl concentration. this indicates that there are repulsive electrostatic interactions between assembly units and that they increases with increasing ph from . to . . a possible interaction is the repulsion between net charges of dimers since pi of ftn is expected to be . . to test this possibility, we made several mutants with different net charges. as mutational sites, we selected charged residues that are far from the subunit interface. selected sites were glu , glu , glu , glu and glu . we constructed the mutants with one, two, three or four glu -> gln substitutions of selected sites. the structures of those mutants were similar to that of wild-type ftn. if aforementioned hypothesis is correct, the assembly rate is expected to increase with increasing the number of substitution. the result agreed well with this expectation and strongly suggested that the electrostatic repulsion between dimers is an important factor determining the assembly rate of ftn. improved modeling of protein unfolding rates and pathways through solvation and modeling of beta-barrels benjamin walcott , , lu ıs garreta , christopher bystroff , , department of biology, rensselaer polytechnic institute, center for biotechnology and interdisciplinary studies, department of computer science, universdad del valle, department of computer science, rensselaer polytechnic intitute an understanding of the folding and unfolding pathways of proteins is integral to improving our ability to associate the structural impact of point mutations and disease etiology. information gained here can also be used for protein structure prediction and design. to model unfolding pathways in proteins we utilize a computational method called geofold. this approach uses recursive hierarchical partitioning of protein structure and finite elements simulation. geofold considers three types of partitioning operations: translational motion (break), single point revolute joints (pivot), and rotation around two points (hinge). from these operations, a directed acyclic graph (dag) is constructed where nodes correspond to the substructures created by these operations and the edges represent the operations. for each operation in the dag, its dissociation and reassociation rates are determined as a function of solventaccessible surface area, hydrogen bonds, voids, and conformational entropy. finite element simulations are carried out to simulate the kinetics of unfolding. this model accurately predicts changes in unfolding pathways due to disulfides in a four-protein case-study, but it fails to produce a realistic pathway for b-barrel proteins such as green fluorescent protein (gfp). to better model these barrel proteins, a new partitioning operation is introduced involving the breaking of all contacts between an adjacent set of b-strands, called a seam. in addition, to improve the accuracy of kinetic modeling, several updates have been made to the energy function, including an improved solvation model and a contact-orderbased estimation of the reassociation rates. the predicted unfolding rates and pathways using this improved geofold are compared with experimentally measured values in kineticdb for proteins with multi-state unfolding kinetics, point mutations, circular permutations, and engineered disulfides. the presence of multiple domains in a protein can result in the formation of partially folded intermediates, leading to increased aggregation propensity. this can be reduced by cooperative, all-or-nothing folding of the multi-domain protein. in good agreement with ensemble folding experiments, a coarsegrained structure-based model of e. coli adenylate kinase (ake) folds cooperatively. ake has three domains, nmp, lid and core. we examine the role of the interfaces between these domains in facilitating folding cooperativity in ake. mutants in which these interfaces are deleted exhibit similar folding cooperativities as wild-type ake. on closer inspection, we observe that unlike a typical multi-domain protein in which one domain is singly-linked to its adjacent domain, nmp and lid are inserted into core, i.e. they are both connected to core by two linkers each. we create circular permutants of ake in which the inserted domains are converted to singly-linked domains, and find that they fold less cooperatively than wild-type ake. domain insertion in wild-type ake facilitates folding cooperativity even when the inserted domains have lower stabilities. the n-and c-termini of nmp and lid are constrained upon the folding of core and this facilitates their folding. thus, nmp and lid which undergo large conformational changes during catalysis can be smaller with fewer stabilizing interactions. in addition, inter-domain interactions need not be optimized for folding, and can be tuned for substrate binding, conformational transition and catalysis. analysis of protein domains using structural bioinformatics suggests several examples of multi-domain proteins in which domain insertion is likely to facilitate folding cooperativity. tuning cooperativity on the free energy landscape of protein folding pooja malhotra , jayant udgaonkar national centre for biological sciences, tata institute of fundamental research the mechanism by which a protein explores the free energy landscape during a folding or unfolding reaction is poorly understood. determining whether these reactions are slowed down by a continuum of small ( kbt) free energy barriers or by a few large (> kbt) free energy barriers is a major challenge. in this study the free energy landscape accessible to a small protein monellin is characterized under native-like conditions using hydrogen exchange in conjunction with mass spectrometry. cooperative and noncooperative opening processes could be directly distinguished from the mass distributions obtained in the ex limit. under native conditions, where the native state is maximally stable, the unfolded state is transiently sampled in an entirely non-cooperative and gradual manner. under conditions which stabilize the unfolded state or destabilize the native state of the protein, the slowest structure opening event becomes cooperative. the present study provides an understanding of the relationship between stability and folding cooperativity. it suggests that the cooperative transitions observed in unfolding reactions maybe a consequence of the changes in the stabilities of the unfolded state and the transition state. it also provides rare experimental evidence for a gradual unfolding transition on a very slow timescale. role of electrostatic repulsion between unique arginine residues on the assembly of a trimeric autotransporter translocator domain eriko aoki , kazuo fujiwara , masamichi ikeguchi haemophilus influenzae adhesin (hia) belongs to the trimeric autotransporter family. the autotransporter consists of an n-terminal signal peptide, an internal passenger domain and a c-terminal translocator domain. the signal peptide directs to export across the inner membrane via the sec system and is cleaved, the passenger domain is a virulence factor, and the translocator domain (hiat) is embedded in the outer membrane. the crystal structure of hia translocator domain (hiat) has shown that hiat forms a transmembrane b-barrel of b-strands, four of which are provided from each subunit. the b-barrel has a pore that is traversed by three a-helices, one of which is provided from each subunit. the protein has a unique arginine residue at . arg side chains from three subunits protrude from the b-strand toward the center of the barrel and are close to each other. these residues seem to have an unfavorable electrostatic effect on the assembly and decrease the trimer stability. to investigate the role of this residue on the trimer assembly and stability of hiat, we replaced this arginine with the neutral amino acid, methionine (r m) or the positively charged residue, lysine (r k), and properties of these mutants were investigated. hiat and two mutants were dissociated by formic-acid treatment, and they were able to reassemble in the presence of the detergent. to measure the time course of trimer reassembly, amounts of reassembled trimer and monomer were quantified by sds-page at different assembly times. although the neutralized mutation increased the rate of reassembly, the final amount of reassembled trimer decreased, especially at higher protein concentration. these suggest that the neutralized mutation cause the incorrect oligomer formation. the far-uv cd spectrum of reassembled wt hiat was nearly identical with that of the native wt hiat. however, the spectrum of the reassembled r m mutant was more intense that of the native r m mutant, although the proportion of trimer was much lower than that of the wt hiat. this suggests that the incorrect oligomer has a secondary structure different from the wt hiat. r k mutant showed assembly properties similar to those of the wt hiat. therefore, the repulsion between positively charged residues seems to be important for preventing hiat from misassembly. similar proximity of arginine residues is observed for hiv capsid protein, carboxysome shell protein, lumazine synthase and so on. the electrostatic repulsion between arginine residues may be a general mechanism for protein assembly. department of veterinary pathobiology, kagoshima university, institute for food sciences, hirosaki university, faculty of fisheries, kagoshima university, department of veterinary histopathology, kagoshima university, veterinary clinical training center, kagoshima university, department of veterinary anatomy, kagoshima university, sakamoto kurozu inc., the united graduate school of agricultural sciences, kagoshima university kurozu is a traditional japanese rice vinegar. during fermentation and aging of the kurozu liquid in an earthenware jar over year, solid residue called kurozu moromi is produced. in the present study, we evaluated whether concentrated kurozu or kurozu moromi could ameliorate cognitive dysfunction in the senescence accelerated p mouse. senescence accelerated p mice were fed . % (w/w) concentrated kurozu or . % (w/w) kurozu moromi for or weeks. kurozu suppressed cognitive dysfunction and amyloid accumulation in the brain, while kurozu moromi showed a tendency to ameliorate cognitive dysfunction, but the effect was not significant. we hypothesize the effect is caused by the antioxidant effect of concentrated kurozu, however, the level of lipid peroxidation in the brain did not differ in senescence accelerated p mice. dna microarray analysis indicated that concentrated kurozu increased hspa a mrna expression, a protein that prevents protein misfolding and aggregation. the increase in hspa a expression by kurozu was confirmed using quantitative real-time pcr and immunoblotting methods. therefore, the suppression of amyloid accumulation by concentrated kurozu may be associated with hspa a induction. however, concentrated kurozu could not increase hspa a expression in mouse primary neurons, suggesting it may not directly affect neurons. young-ho lee although amyloid fibrils are associated with a number of pathologies, their conformational stability remains largely unclear. we herein investigated the thermal stability of various amyloid fibrils. a-synuclein fibrils, freshly prepared at c at neutral ph, cold-denatured to monomers at - c and heat-denatured at - c. meanwhile, the fibrils of b -microglobulin, alzheimer's ab - /ab - peptides, and insulin exhibited only heat denaturation, although they showed a decrease in conformational stability at low temperature in the presence of chemical denaturants. a comparison of structural parameters with positive enthalpy and heat capacity changes which showed opposite signs to protein folding suggested that the burial of charged residues in the fibril cores contributed to the cold denaturation of a-synuclein fibrils. reinforced electrostatic repulsion at low temperatures may promote cold denaturation, leading to a unique thermodynamic property of amyloid fibrils. we propose that although cold-denaturation is common to both native proteins and misfolded fibrillar states, the main-chain dominated amyloid structures may explain amyloid-specific cold denaturation due to the unfavorable burial of charged side-chains in fibril cores. key structural differences between tbtim and tctim revealed by thermal unfolding molecular dynamics simulations angel piñeiro , miguel costas , andrea guti errez-quezada dept of applied physics, university of santiago de compostela, lab. of biophys. chem., dept of physical chemistry, fac. of chemistry, unam the thermal unfolding pattern obtained by differential scanning calorimetry for trypanosoma cruzi and trypanosoma brucei triosephosphate isomerase (tcim and tctim) are significantly different although the crystal structure of both proteins is almost indistinguishable and the sequences are highly homogolous. in order to explain these differences at molecular level a set of molecular dynamics simulations were performed at different temperatures between and k. the obtained trajectories were analyzed in detail and the residues that showed to be key in the unfolding pathway of each species were identified. a set of residues that behave significantly different between both proteins were selected and proposed for mutations. the general aim is to identify the minimum amount of residue mutations that allow providing tbtim with the behaviour of tctim and vice versa. experimental complementary work is also being performed on the same protein. repositioning som as a potent inhibitor of transthyretin amyloidogenesis and its associated cellular toxicity salvador ventura , ricardo sant'anna , maria ros ario almeida , nat alia reixach , raul insa , adrian velazquez-campoy , david reverter , n uria reig universitat aut onoma de barcelona, instituto de biologia molecular e celular, icbas, the scripps research institute, som-biotech, universidad de zaragoza transthyretin (ttr) is a plasma homotetrameric protein implicated in fatal amyloidosis. ttr tetramer dissociation precedes pathological ttr aggregation. despite ttr stabilizers are promising drugs to treat ttr amyloidoses, none of them is approved by the food and drug administration (fda). repositioning existing drugs for new indications is becoming increasingly important in drug development. here, we repurposed som , an fda-approved molecule for neurodegenerative diseases, as a very potent ttr aggregation inhibitor. som binds specifically to ttr in human plasma, stabilizes the tetramer in vivo and inhibits ttr cytotoxicity. in contrast to most ttr stabilizers, it exhibits high affinity for both ttr thyroxine -binding sites. the crystal structure of som -bound ttr explains why this molecule is a better amyloid inhibitor than tafamidis, so far the only drug in the market to treat the ttr amyloidoses. overall, som , already in clinical trials, is a strong candidate for therapeutic intervention in these diseases. neurometals as modulators of protein aggregation in neurodegenerative diseases s onia s. leal , joana s. crist ovão , cl audio m. gomes protein misfolding and aggregation is a hallmark across neurodegenerative diseases such as alzheimer's disease and amyotrophic lateral sclerosis (als). since these diseases are mostly sporadic, the formation of protein amyloids in the nervous system depends of chemical and biological triggers within the neuronal environment, such as metal ions [ ] . in this communication i will overview the metallobiology of neuronal calcium, zinc and copper, which are key players in brain function and have altered homeostasis in most neurodegenerative conditions. our recent work will illustrate how this allows establishing molecular mechanisms in neurodegenerative diseases [ ] [ ] [ ] [ ] [ ] . in the pursuit of this goal, in the last years we have been investigating superoxide dismutase (sod ), a cu/zn metalloenzyme that aggregates in the fatal neurodegenerative disorder als, as a model. in sod -als cases, this ubiquitous protein selectively aggregates in motor neurons, implicating a local biochemical factor in the process: interestingly, zn and ca levels are upregulated in the spinal and brain stem motor neurons of als patients, and increased ca triggers multiple pathophysiological processes which include direct effects on the sod aggregation cascade [ , ] . recently we established that calcium ions promote sod aggregation into non-fibrillar amyloid, suggesting a link to toxic effects of calcium overload in als [ ] . we showed that under physiological conditions, ca induces conformational changes on sod that increase sod b-sheet content and decrease sod critical concentration and nucleation time during aggregation kinetics. we also observed that calcium diverts sod aggregation from fibrils towards amorphous aggregates. interestingly, the same heterogeneity of conformations is found in als-derived protein inclusions. we thus hypothesized that transient variations and dysregulation of cellular ca and zn levels contribute to the formation of sod aggregates in als patients [ , ] . in a follow up study we combined experimental and computational approaches to show that the most frequent ligands for ca are negatively-charged gatekeeper residues located in boundary positions with respect to segments highly prone to edge-to-edge aggregation. calcium interactions thus diminish gatekeeping roles by shielding repulsive interactions via stacking between aggregating b-sheets, partly blocking fibril formation and promoting amyloidogenic oligomers such as those found in als inclusions. interestingly, many fals mutations occur at these positions, disclosing how ca interactions recreate effects similar to those of genetic defects, a finding with relevance to understand sporadic als pathomechanisms [ ] . the amino acid proline is well-known by its disorder promoting and helix breaking properties. prolines can be accommodated within transmembrane (tm) alpha-helices and participate in important biological tasks like signal transduction, ligand binding and helix-helix packing. x-ray crystallography and nmr indicate that proline residues in membrane proteins induce distortions of the helix geometry to different extents ranging from small bends to severe kinks. however, such studies provide essentially a static snapshot of membrane-embedded helices. therefore, the link between proline dynamics and function is not completely understood. in this work we have used singlemolecule f€ orster resonance energy transfer (smfret) and fluorescence correlation spectroscopy (fcs) to probe the structure and dynamics of the tm domain of human glycophorin a (gpa), a widely used model membrane protein for oligomerization studies. a fluorescent dye pair has been attached to both ends of the membrane-spanning region of gpa, which allowed monitoring the average distance and distance fluctuations between the attachment points. site-specifically double-labeled gpa has been reconstituted into two membrane-mimetic systems: sds micelles and phospholipid bilayers assembled into nanodiscs. using proline-scanning mutagenesis we have systematically evaluated the impact of proline residues in different positions along the membrane normal on transmembrane helix length and lateral packing. furthermore, we have investigated the distance distribution in tm helices containing native prolines, namely the insulin receptor and the nesprin protein. our results shed light into the relation between proline dynamics and the folding and function of tm helices. thermodynamic contributions of specific mutations of l e protein in the rna: protein interface region measured by analytical ultracentrifugation and gel shift assay bashkim kokona , , sara kim , margaret patchin , britt benner , susan white in saccharomyces cerevisiae, ribosomal protein l e acts as an autoregulator by inhibiting the splicing of its pre-mrna and translation of its mrna. the l e protein-rna binding site has been previously studied, revealing a rna kink-turn motif, which is characterized by a sharp bend in the phosphodiester backbone due to unpaired nucleotides and internal tertiary interactions. l e structural flexibility at the rna-binding interface makes such interaction an excellent model to explore the energetics of rna protein binding. we made l e k a, f a, and f w mutants to quantify the thermodynamic contributions of such interactions to the protein-rna complex. we used analytical ultracentrifugation sedimentation equilibrium (se) and sedimentation velocity (sv) to investigate conformational changes and protein-rna binding free energy changes due to mutations. our computed changes of binding free energy based on the sedimentation equilibrium experiments were consistent with the gel shift assay results. in addition, sedimentation velocity experiments on the l e wild type indicate that protein-rna interaction is highly dynamic and involves conformational changes of the kink-turn rna induced by l e protein. our results provide new insights on understanding the binding between ribosomal proteins and their rna molecules counterpart, which can be used to complement the x-ray structure. role of a non-native a-helix in the folding of equine b-lactoglobulin takahiro okabe , toshiaki miyajima , kanako nakagawa , seiichi tsukamoto , kazuo fujiwara , masamichi ikeguchi equine b-lactoglobulin is a small globular protein ( residues). although elg adopts a predominantly b-sheet structure consisting of nine anti-parallel b-strands (a-i) and one major a-helix in the native state, it has been shown that a non-native a-helical intermediate accumulates during the burstphase of folding reaction from the unfolded state in the concentrated denaturant. to ask whether the non-native helix formation is important for acquiring the native b-sheet structure, we determined first where the non-native a-helix is formed. a stable analogue of the burst-phase folding intermediate was observed at acid ph (a state). the amide hydrogen exchange experiment and proline-scanning mutagenesis experiment have shown that the non-native a-helix is formed at the region corresponding to the h strand in the a state. to investigate the role of this non-native a-helix on refolding reaction of elg, we constructed several mutant proteins, which were designed to destabilize the nonnative a-helix in the folding intermediate without perturbation on the native structure. a mutant, a t, fulfilled this requirement, that is, a t showed a native structure similar to that of the wildtype protein, and largely reduced cd intensity in the a state. then, the refolding kinetics were investigated by the cd and fluorescence stopped-flow method. a t mutation resulted in reduction of the burst-phase cd intensity, which confirmed that the non-native a-helix is formed around the h strand region. subsequent to the burst-phase, four kinetic phases were observed for a t and the wildtype protein. importantly, the folding rate constants of the four kinetic phases were similar between both proteins. furthermore, interrupted refolding experiments demonstrated that the native state was formed in the two parallel pathways in the two slower phases of the four kinetic phases. the relative amplitudes of the two pathways were similar between a t and the wild-type protein. these results clearly showed that the formation of the non-native helix has little effect on the folding rates and pathways, and suggested that the non-native helix formation may not be a severe kinetic trap for protein folding reaction. impact of the chaperonin cct in a-synuclein(a t) amyloid fibrils assembly ahudrey leal_quintero , javier martinez-sabando , jose mar ıa valpuesta , begoña sot centro nacional de biotecnolog ıa (cnb/csic)., centro nacional de biotecnolog ıa (cnb/csic)., centro nacional de biotecnolog ıa (cnb/csic)., centro nacional de biotecnolog ıa (cnb/csic) and fundaci on imdea-nanociencia cct is a eukaryotic chaperonin that uses atp hydrolysis to encapsulate and fold nascent protein chains. moreover, it has recently been shown that cct is able to inhibit amyloid fibers assembly and toxicity of the polyq extended mutant of huntingtin, the protein responsible of huntington disease. although this opens the possibility of cct being also able to modulate other amyloidopathies, this has not addressed yet. the work presented here intends to determine the effect of cct in the amyloid fibers assembly of a-synuclein(a t), one of the mutants responsible of parkinson disease. it is demonstrated that cct is able to inhibit a-synuclein(a t) fibrillation in a nucleotide independent way, suggesting that this effect is based on binding rather than on active folding. furthermore, using deletion mutants and assaying the interaction of cct with monomers, soluble oligomers and fibres, it has been possible to unravel the mechanism of this inhibition: cct interferes with fibers assembly by interacting with a-synuclein(a t) nac domain once soluble oligomers are formed, thus blocking the reaction before the fibers start to grow. amyloid-like aggregation of nucleophosmin regions associated with acute myeloid leukemia mutations daniela marasco , concetta di natale , valentina punzo , domenico riccardi , pasqualina scognamiglio , roberta cascella , cristina cecchi , fabrizio chiti , marilisa leone , luigi vitagliano department of pharmacy, cirpeb: centro interuniversitario di ricerca sui pepti, section of biochemistry, department of biomedical experimental and clinical scie, institute of biostructures and bioimaging nucleophosmin (npm ) is a multifunctional protein involved in a variety of biological processes and implicated in the pathogenesis of several human malignancies. npm has been identified as the most frequently mutated gene in acute myeloid leukemia (aml) patients, accounting for approximately % of cases ( ). the most frequent human npm mutations lead to variants with altered c-terminal sequences of the c-terminal domain (ctd) that, in its wild form, folds as a three helix bundle. aml modifications lead to (a) an unfolding of the ctd in the mutated protein and (b) its accumulation in the cytoplasm due to the loss of nuclear localization sequences with mutations of trp (mut e) and also of trp (mut a) ( ) . to gain insights into the role of isolated fragments in npm activities we dissected the ctd in its helical fragments. here we describe the unexpected structural behavior of the fragments corresponding to the helices h and h in both wild-type and aml-mutated variants. h region shows a remarkable tendency to form amyloid-like assemblies while only the muta sequence of h region is endowed with and b-sheet structure, under physiological conditions, as shown by circular dichroism, thioflavin t and dynamic light scattering. the aggregates of h , are also toxic to neuroblastoma cells, as determined by using the mtt reduction and ca influx assays ( ) . furthermore the effects of the local context on the different tendencies to aggregate of h and h were investigated and appeared to influence for the aggregation propensity of the entire ctd. since in aml mutants the ctd is not properly folded, we hypothesize that the aggregation propensity of npm regions may be implicated in aml etiology. these findings have implications to elucidate the pathogenesis of aml caused by npm mutants and aggregation phenomena should be seriously considered in studies aimed at unveiling the molecular mechanisms of this pathology. we report a resume of our study regarding the effects of microwaves in the range - mhz on a typical protein, myglobin. previous literature have concerned the effects on living and in vitro organic systems induced by high frequencies electromagnetic fields. we have focused our attention on a typical protein, myoglobin, because proteins are the simplest organic systems that are fundamentals in organic functions of livings. myoglobin is a protein found mainly in muscle tissue of vertebrates, consisting of a single protein chain with amino acids and one heme group that stores oxygen in the muscle cells. the physiological importance of myoglobin is mainly related to its ability to bind molecular oxygen. in particular, we focused our attention on the secondary structure of this protein in order to highlight whether exposure to microwaves unfold the protein producing transitions from a-helix component to b-sheet features. to this aim fourier transform infrared (ftir) spectroscopy have been used. the importance of this study is related to previous literature which indicated that transition from a-helix to b-sheet structure in a protein can be responsible for aggregation mechanisms that can lead to neurotoxicity and neurodegenerative disorders that can be considered as the first step to some pathologies [ ] [ ] [ ] . the aggregates consist of fibers containing unfolded proteins with a prevalent b-sheet structure termed amyloid [ ] . in our studies myoglobin in deuterium oxide (d o) solution was exposed for h to mobile phone microwaves at and mhz at a power density of w/m . ftir spectra were recorded by a spectrometer vertex v from bruker optics, following the protocol accurately described in [ ] [ ] [ ] . ftir spectroscopy analysis evidenced an increase in intensity of b-sheet structures and a significant shift to lower frequencies of about . cm- of the amide i vibration after exposure [ , ] . these results led to conclude that mobile phone microwaves induce proteins unfolding and formation of aggregates [ , ] . membrane proteins play a vital role in many biological processes, and yet remain poorly understood as they are frequently unstable in vitro. the goal of this project is to investigate the insertion and folding of membrane proteins into lipid bilayers, using a cell free expression system. we have used both e.colibased cell extracts (s ), and commercial translation systems (purexpress) in combination with synthetic liposomes of defined lipid composition. these studies will aid understanding of cooperative folding, folding intermediates, and the effects of the lipid bilayer on folding and insertion. model e.coli proteins have been investigated, as they can offer important insights into other proteins, and thus facilitate the further study of more biologically relevant proteins. it has been found that the rhomboid protease glpg spontaneously inserts into liposomes without the aid of an insertase such as secyeg. this spontaneously inserted glpg is functional, and is able to cleave bodipy-labeled casein, yielding a fluorescent product. the major facilitator superfamily (mfs) transport proteins lacy, galp and glpt have also been found to insert spontaneously into liposomes. it has been shown that the lipid composition of the liposomes has an effect on the amount of protein inserted into the bilayer, with all proteins tested to date preferring liposomes containing at least mol% dopg. ongoing and future work will involve the use of rare codons to alter the rate of translation, to investigate the effect this has on the final folded structure of the protein. preliminary work is also currently being done into whether the two domains of the mfs family transporters fold cooperatively or independently, thus aiding understanding into the folding and stability of membrane transport proteins. frederic greco , audrey toinon , nadege moreno , marie claire nicola€ ı rabies remains an important worldwide health problem that causes a fatal encephalomyelitis [ ] . currently, rabies in humans is under control in europe and north america following the use of efficient vaccines for dogs and wild animals. however, it still kills more than , people every year mainly in africa and asia [ ] . human vaccination prevents infection with very high efficacy. the vaccine contains an inactivated rabv produced on vero cells. rabv is an enveloped, negative single stranded rna virus which encodes five proteins, namely the nucleoprotein (n), the phosphoprotein (p), the matrix protein (m), the glycoprotein (g), and the viral rna polymerase (l) [ ] . the viral envelope is covered by trimer spikes of g-glycoprotein which is the most significant surface antigen for generating virus-neutralizing antibodies. here we illustrate the use of dsc (differential scanning calorimetry) to identify structural domains or proteins involved in thermal transitions. the dsc thermogram for intact beta-propiolactone inactivated rabv samples in pbs buffer reveals two major thermal transitions with a tm respectively at c and c. we have initially focused our investigations on one of the major proteins encode in rabv, glycoprotein g [ ] . glycoprotein g contains disulfide bridges on the ectodomain [ ] , is sensitive to bromelain cleavage [ ] and shows reversible conformation changes at low ph [ ] . considering these characteristics, our results provide evidence on the identity of one thermal transition observed by dsc. keywords: rabies virus, differential scanning calorimetry, protein unfolding domain swapping of the dna-binding domain of human foxp is facilitated by its low folding stability exequiel medina, sandro l. valenzuela, crist obal c ordova, c esar a. ram ırez-sarmiento and jorge babul departamento de biolog ıa, facultad de ciencias, universidad de chile, santiago, chile protein folding and dimerization (or oligomerization) are biologically relevant processes when reaching the quaternary structure is required for function. proteins that form dimers by exchanging segments or domains of their tertiary structure with another subunit, the so-called domain swapping phenomenon, are examples where folding and dimerization are tightly concerted processes. previous studies on domain swapping proteins, such as p suc and diphtheria toxin, have shown that, in general, a high kinetic barrier separates monomers and domain swapped dimers, and that this barrier can be lowered by promoting protein unfolding and refolding at high protein concentrations, thus favoring the swapped oligomer. recent crystal structures of the dna-binding domain of several human forkhead box (fox) proteins have shown that the p subfamily of these transcription factors (foxp) can form swapped dimers. the human foxp proteins are interesting models of domain swapping, because mutations of the dna-binding domain of these proteins are linked to diverse inherited disorders in humans, such as ipex and language deficits, and some of these mutations are located in the hinge region that connects the exchanged segment with the rest of the protein. moreover, foxp and foxp have been described to reach monomer-dimer equilibrium in solution after hours of incubation, suggesting that a low kinetic barrier separates both species. using foxp as a model of domain swapping, we analyzed the temperature and protein concentration effects on the dimer dissociation, obtaining the free energy change and enthalpy of the process by van't hoff analysis (dh of . kcal•mol- , ds of . kcal•mol- •k- and dg at c of . kcal•mol- ). these results indicate that the monomer-monomer association is an example of an enthalpy-driven process. to understand how foxp domains swap without protein unfolding, we performed equilibrium unfolding experiments using gndhcl as denaturant, showing that the wild-type protein has a low stability (dgu kcal•mol- , cm . m at c), in contrast to other domain swapping proteins with high kinetic barriers. we further explore the domain swapping mechanism of foxp through biased targeted molecular dynamics simulations, showing that the exchange process can occur by specific local destabilization and unfolding of the hinge region and helix h . to further corroborate that the low stability of wild-type foxp facilitates its domain swapping, we engineered a monomeric version of foxp through a single-point mutation in the hinge region, which has been previously described in the literature, and used this protein to visualize the effect of monomer stability in the dimer formation. comparison of the folding stability of the monomeric mutant a p and wild-type foxp shows that ddgu (mutant-wild-type) is . kcal/mol, concluding that the ability of foxp to domain swap rapidly can be explained through its low monomer stability and local unfolding of the exchange region. funding: fondecyt and . determining the coupled interactions that stabilize the structural framework of the ß-propeller fold loretta au , david green , , department of statistics, the university of chicago, department of applied mathematics and statistics, stony brook university, graduate program in biochemistry and structural biology, stony brook university, laufer center of physical and quantitative biology, stony brook university b-propeller proteins are a highly evolved family of repeat proteins that are involved in several biological pathways, such as signal transduction, cell-cycle modulation and transcription regulation, through interactions with diverse binding partners, despite having a similar fold. as for all repeat protein families, there is a consistent pattern in secondary structure for each repetitive region, in addition to the entire family. typically, four to ten propeller blades (each containing four anti-parallel b-sheets) are arranged in a toroidal shape, thus providing a large binding surface for ligands or other proteins. about % of known proteins adopt this distinctive fold, and although the requirements for tertiary structure and protein function are fundamentally encoded in primary structure, this relationship is not fully understood, and addressing it could provide insight on why the b-propeller fold is common. many techniques in comparative sequence analysis can successfully identify amino-acid conservation between closely related proteins, but molecular interactions between amino acids are often neglected, and further experimentation is still needed to determine the reasons underlying conservation. to explore how primary structure can dictate fold and function, we devised a computational approach to perform large-scale mutagenesis, by adapting the dead-end elimination and a* search algorithms (dee/a*), and also leveraged the structural conservation of each repeating region to understand how sequence variation influences protein fitness, defined here as a combination of stabilizing and binding interactions. dee/a* can evaluate low-energy protein sequences and their corresponding three-dimensional structures, and we used the bsubunit of a g-protein heterotrimer (pdb: gp , gia b g ) as a model system to demonstrate: ( ) how the multiple roles of individual amino acids in protein fitness can be deconvolved, and ( ) how epistatic interactions between them can contribute to structural stability. in doing so, we were able to identify important patterns in sequence complementarity between repeating regions that cannot be found using sequencebased methods alone. these results suggest that computational approaches can be used to determine important protein interactions, and help elucidate the prevalence of b-propeller proteins in biology. temperature induced conformational changes of the villin headpiece miniprotein stanislaw oldziej , wioletta _ zmudzi nska , anna hałabis the c-terminal subdomain of the actin-binding protein villin called hp (villin headpiece) has been used as a model protein in a number of studies of protein folding kinetics and protein folding mechanism [ , ] . the hp is a residue miniprotein with an alpha-helix bundle three-dimensional fold. the goal of our work was to determine conformational ensemble of polypeptide chain of the investigated miniprotein at a wide range of temperatures to get detailed information about how protein structure is influenced by temperature. d nmr spectra of the title miniprotein were registered at , and k. the three-dimensional structure of the hp based on restraints derived from nmr spectra registered at k is almost identical with structure deposited in the pdb database in the record f k [ ] . at higher temperatures ( and k) the general shape of the protein remains unchanged, with well packed hydrophobic core. however, with temperature increase alpha-helices start to melt. at k structure of the protein remains compact and in general shape similar to structure observed at k, but none of the alpha-helices could be observed. results obtained for hp protein are in agreement with previous observation for the trp-cage miniprotein [ ] , that with temperature increase regular secondary structure elements melt first before the break-up of the hydrophobic core of the protein. biological membranes provide a selective and chemically sealed barrier for cells. transport of ions and small molecules across the membrane is mediated by transporter proteins and the breakdown of a cell's ability to produce functionally folded membrane transport proteins can lead to dysfunction and has been implicated in many diseases . however little is known about the processes that govern the misfolding of a-helical integral membrane proteins, taking into account that these proteins fold and maintain functional structures within membranes of various organelles. the neurotransmitter sodium symporter (nss) protein family is an example of a-helical transporter proteins. the nss family encompasses a wide range of prokaryotic and eukaryotic ion-coupled transporters that regulate the transport of neurotransmitter molecules whose dysfunction has been implicated in multiple diseases and disor-ders . we have investigated the folding processes of prokaryotic homologue of the nss family leut responsible for the transport of neurotransmitters and amino acids to the sodium electrochemical gradient. previously folding processes of membrane transporters have mainly been characterised within detergent micelles. however, detergent micelles are not an accurate depiction of the environment of the membrane bilayer, with this in mind we have also attempted to investigate folding processes within a bilayer pd- nmr investigation of ph-induced unfolding of b domain of an escherichia coli mannitol transporter ii mannitol in the bacterial phosphotransferase system kim gowoon , yu taekyung , suh jeongyong the bacterial phosphotransferase system (pts) mediates sugar phosphorylation and translocation across the cytoplasmic membrane. cytoplasmic b domain (iib mtl) of the mannitol transporter enzyme ii mannitol, a pts family protein, delivers a phosphoryl group from a domain to an incoming mannitol that is translocated across the membrane. iib mtl is comprised of a four-stranded ß-sheet and three helices, representing a characteristic rossmann fold. we found that the iib mtl of escherichia coli unfolded at a mildly acidic condition. we made iib mtl mutants to investigate the mechanism of the ph-induced unfolding using nmr spectroscopy. we monitored backbone amide groups and side chain imidazole groups of histidine residues using d hsqc nmr, and pointed out a potential histidine residue that might be responsible for the unfolding. histidine residues may be generally important to the folding stability in response to environmental ph changes. can site-directed mutagenesis shed light on the refolding pattern of human glucose -phosphate dehydrogenase (g pd)? nurriza ab latif , , paul engel conway institute, univerversity college dublin, faculty of biosciences and medical engineering, universiti teknologi malaysia human glucose -phosphate dehydrogenase (g pd) is the first enzyme involved in the pentose phosphate pathway (ppp). this oligomeric enzyme catalyses the reaction of glucose -phosphate to form phosphogluconolactone with concomitant reduction of nadp to nadph. in erythrocytes nadph is important mainly for protection against oxidative stress. in connection with its role as the sole source of nadph, g pd deficiency commonly causes haemolytic disease and is known as the most common human enzyme deficiency globally. protein folding problems and instability are believed to be the major defects in the deficient enzymes. in this study, we employed site directed mutagenesis with hope to give more information on the role of -sh groups in the refolding of human g pd. two mutants were created: ) one in which all cys residues were replaced by ser and ) one in which only c and c were retained. the refolding of recombinant human g pd has been studied primarily by measuring the enzyme activity after refolding. we also used a combination of intrinsic protein fluorescence, ans ( -anilino- -naphthalenesulphonic acid) binding and limited proteolysis to look at the conformational change during the refolding. the results showed that gdnhcl-denatured recombinant human g pd wild type could be refolded and reactivated by rapid dilution technique. even though, as recombinants in e. coli, the mutants were well expressed and active, they remained inactive after attempts were made to refold them in vitro. the methods we applied may have provided some insights on the refolding pattern of this oligomeric protein, albeit qualitatively rather than quantitatively. a single aromatic core mutation converts a designed 'primitive' protein from halophile to mesophile folding connie tenorio , liam longo , ozan s. kumru , c. russell middaugh , michael blaber department of biomedical sciences, florida state university, department of pharmaceutical chemistry, university of kansas experiments in prebiotic protein design suggest that the origin of folded proteins may have favored halophile conditions. these results are consistent with salt induced peptide formation which shows that polymerization of amino acids is also promoted by high salt concentrations. as a result of various origin of life studies, a consensus on which amino acids likely populated early earth has emerged. these residues were synthesized by abiotic chemical and physical processes from molecules present in the surrounding environment. the properties of the consensus set of common prebiotic amino acids (a,d,e,g,i,l,p,s,t,v) are compatible with known features of halophile proteins, meaning these proteins are only stable in the presence of high salt concentrations. the halophile environment, thus, has a number of compelling aspects with regard to the origin of structured polypeptides. consequently, a proposed key step in evolution was, movement out of the halophile regime into a mesophile one commensurate with biosynthesis of "phase " amino acids -including the aromatic and basic amino acids. we tested the effects of aromatic residue addition to the core of a "primitive" designed protein enriched for the prebiotic amino acids (a, d, e, g, i, l, p, s, t, v) that required halophilic conditions for folding. the subsequent results show that the inclusion of just a single aromatic residue was sufficient for movement to a mesophile folding environment. thus, the inclusion of aromatic residues into the codon table could have conferred key stability to early proteins enabling adaptive radiation outside of a halophile environment. contact prediction methods that rely on sequence information alone, such as evfold, can be used for de novo d structure prediction and identification of functionally important residues in proteins. large multiple sequence alignments of protein families consisting of evolutionarily related and plausibly isostructural members reveal co-variation patterns that can be used to identify interactions between pairs of amino acids. we use a global probability model to disambiguate direct and indirect correlations. specifically, we use a maximum entropy approach called pseudo-likelihood maximization (plm) to distinguish causation (residue interactions) from correlation (correlated mutations) and compute evolutionary couplings (ecs). the inferred set of residue interactions can then be interpreted as physical contacts and used in de novo d structure prediction. furthermore, the interactions that are inferred can help guide experiments that measure the phenotypic consequences of protein substitutions, making the method useful for functional studies. the present work can be divided into three areas: (i) methodological improvements related to alignment, folding procedure, structure refinement and ranking; (ii) folding of proteins of known structure for benchmarking and prediction of proteins of unknown structure; and (iii) focused exploration of specific cases of interest. developing shuffle as a platform for expression and engineering of antibodies na ke , alana ali-reynolds , bryce causey , berkmen berkmen shuffle is a genetically engineered e.coli strain that allows disulfide bond formationin its cytoplasm with high fidelity. many proteins containing disulfide bonds have been successfully expressed in shuffle. in this study, we expressed, purified and characterized full-length monoclonal antibody igg in shuffle. for the first time, a fulllength igg can be functionally expressed in the cytoplasm compartment of an e.coli strain. in order to improve the folding and assembly of igg, we have investigated the expression of igg in various formats and vectors; we have co-expressed chaperones and other helper proteins with igg. several-fold increase in the yield of fulllength igg was observed. we characterized the shuffle produced igg and found it comparable to hybridoma produced igg. optimization of fermentation conditions for a large-scale production is in progress. we aim to develop shuffle as an easy, fast, robust platform for antibody engineering, screening and expression. experimental and computational studies of the effects of highly concentrated solutes on proteins: insights into the causes and consequences of quinary protein structure and cytoplasmic organization most studies of protein structure and function focus on pure, diluted samples; however, real-world biochemistry and typical biotechnological applications of proteins take place in complex media with very high concentrations of solutes ( - g/l) of varied size and chemical nature. on one side, this has recently fostered the study of proteins in vivo, in cell, or at least in media mimicking the native conditions. on the other hand, physical chemistry has for a long time studied the general effects of crowded and viscous conditions on proteins, looking mainly at coarse traits like diffusion and stability. but the general effects on traits relevant at atomic/residue resolutions have been less studied, and one fundamental issue remains unsolved: to what extent are proteins forced into interactions with highly concentrated solutes, and with what direct consequences? i will present here our ongoing efforts to dissect the fine effects of high solute concentrations and macromolecular crowding on proteins, based on nmr experiments and md simulations, two complementary techniques of high spatial and temporal resolutions. our results show that smaller solutes are prone to extensive interactions with proteins when at high concentrations while large solutes act chiefly through excluded-volume effects. overall, we observe location-specific perturbations of a protein's surface, its internal dynamics and internal dielectrics, and its hydration, all very dependently on the solute's size and chemical nature. our results support the growing notion that proteins should be studied in native-like media, adding that not only macromolecular crowders but also small molecules should be considered in these studies. last, the fact that high-concentration conditions affect far more than a protein's diffusion rate and stability suggests critical consequences of quinary protein structure and cytoplasmic organization on the regulation of proteins within cellular biochemistry. aldona jeli nska , anna lewandrowska , robert hołyst we developed an analytical technique for the study of interactions of ligands (e.g. cefaclor, etodolac, sulindac) with most abundant blood protein (e.g. bovine serum albumin) using the flow injection method. the experiments were conducted at high flow rates ( cm/s) in a long (> m), thin ( mm) and coiled capillaries. the compound of interest ( ml) was injected into carrier phase, which moved by the poisseule laminar flow. at the detection point we measure the concentration distribution of the analyte. the width of the final profile of the analyte concentration is inversely proportional to the effective diffusion coefficient of the analyte. from the differences between the widths of the concentration distribution of free and bound ligand we can determine value of the association constant. carbohydrate binding modules (cbms), which are defined as contiguous amino acid sequences within a carbohydrate-active enzyme, have been found in both hydrolytic and non-hydrolytic proteins and are classified into families, according to their primary structure similarity. the characterization of cbms by different methods has shown that these modules concentrate enzymes on the surface of polysaccharide substrates. it is thought that maintaining the enzyme in proximity with the substrate leads to more rapid degradation of the polysaccharide. therefore, the study of these kinds of modules or domains is relevant, since they are involved in multiple processes in organisms, like signaling, defense and metabolism; and some of them are involved in allergenic responses. in the present work we studied two different models: the first one is a cbm of the family from lactobacillus amylovorus (lacbm ) that binds starch these domains are present in a a-amylase like a repetitive tandem of five modules that are consecutive and do not present connectors. by means of itc and using a single recombinant lacbm domain we determined a ka . x m- for b-cyclodextrin and a ka . x m- for acyclodextrin. when the number of consecutive recombinant modules increased to three or five tandem modules, the ka values increased to m- ; however, these constants did not show an additive or a synergic effect. for these experiments we fitted the isotherms to different models and used different algorithms. additionally, we used circular dichroism in the uv-far region to determine if there existed conformational changes upon binding of the cyclodextrin molecules to the different tandem modules. we could only observe slight changes in a positive band centered around - nm, which has been explained in terms of p-p; interactions of the aromatic residues at the binding site. these cbms have been used as carriers for in vivo vaccine delivery and affinity tags. the second model is a hevein-like cbm of the family present in a chitinase-like protein from hevea brasiliensis (hbcbm ). hevein is a lectin from h. brasiliensis that shows a % identity with hbcbm . these cmbs are connected to the catalytic domain, in proteins such as chitinases, by a linker of approximately residues. in these experiments we used fluorescence techniques to determine the affinity constants for chitotriose. we previously reported a ka . x m- when using a hbcbm that has a met residue at the nterminal region. besides the aromatic residues at the binding site, the met residue also interacts with the ligand, as determined using crystallographic and docking techniques. the mutant hbcbm -r w that does not have the met residue showed a ka of . x m- with chitotriose, similar to the value reported for hevein using itc (ka . x m- ). interestingly, there exists an isoform of the hbcbm that has a connector between one cbm and a half cbm ( . xhbcbm ). this protein has a ka of . x m- with the same ligand. initiating vesicle formation at the golgi complex: auto-regulation and protein interactions govern the arf-gefs gea and gea margaret gustafson , j. chris fromme molecular decision-makers play critical roles in the effort to maintain efficient and accurate cellular functions. in the case of vesicular traffic at the golgi complex, the decision to initiate vesicle formation is made by a set of guanine nucleotide exchange factors (gefs) that activate the small gtpase arf , which is the master controller for the recruitment of cargos and coat proteins. saccharomyces cerevisiae possess three golgi arf-gefs, gea , gea , and sec , which work at distinct sub-compartments of the golgi to activate arf only when and where appropriate. in the case of sec at the trans-golgi network (tgn), this requires a positive feedback loop in which active arf relieves autoinhibition of sec , as well as recruitment to the golgi membrane and catalytic stimulation by signaling rab gtpases. we know far less about the decisionmaking process for gea and gea , which are responsible for retrograde traffic within the golgi and to the endoplasmic reticulum. i have found that both gea and gea can bind membranes weakly in vitro, an ability which is counteracted by their c-terminal hds domains. in addition, i have discovered membrane recruitment in vitro is aided by the rab gtpase ypt . however, these interactions cannot fully explain the distinct localization patterns of gea , gea , and sec , as all three have been shown to be recruited by ypt , which is found throughout the golgi. my work has revealed that in addition to the well-established distinct localization from sec , gea also occupies different golgi compartments from gea , so specific signals must exist which help the gefs decide where to go. my current efforts focus on understanding the roles of the other domains of gea and gea , identifying the signals which send them to different parts of the golgi, and unraveling the different roles they play in vesicle trafficking pathways. sequence variation in archaea through diversity-generating retroelements sumit handa , blair g paul , kharissa l shaw , david l valentine , partho ghosh department of chemistry and biochemistry, university of california san diego, marine science institute, university of california protein diversification is an essential tool for the survival and evolution for various species. diversitygenerating retroelements (dgr) in bacteria is known to generate massive variation in dna through an error prone reverse transcriptase and retrohoming, which leads to variation in protein sequence. recent discovery of dgrs in intraterrestrial archaeal systems have opened an opportunity to study this massive sequence variation in third domain of life (paul bg, et al. nat. comm.) here, we present the first crystal structure of variable protein from archaea with ligand-binding pocket is surface exposed. also, it has conserved c-type lectin (clec) fold, as shown by previous work on variable proteins, major tropism determinant (mtd) and treponema variable protein a (tvpa) which bind ligands through the clec fold. despite weak sequence identities ( - %) among these variable proteins, clec fold was found to be conserved. this variable ligand-binding site for archaea variable proteins can potentially generate variants. protein synthesis is a dynamic process mediated by a variety of proteins and enzymes. recent studies have shown that hydroxylation is a key post-translational modification involved in translation termination. in particular, the fe(ii)-and -oxoglutarate-dependent oxygenase, jumonji domaincontaining (jmjd ), regulates translation termination via the carbon hydroxylation of an invariant lysine residue, k , of the eukaryotic release factor, erf . in eukaryotes, translation termination is mediated by a release factor complex that includes erf . erf is comprised of three domains, and it is responsible for recognizing stop codons in mrna transcripts before triggering polypeptide release from the ribosome. the lysine residue hydroxylated by jmjd falls within the n-terminal domain and more specifically within the highly conserved niks motif. this motif has been identified by cross-linking and mutagenesis studies to play an essential role in stop codon recognition. while hydroxylation of k by jmjd has been found to increase translational termination efficiency, the exact molecular mechanism by which hydroxylation influences termination remains unclear. this work aims to understand how hydroxylation of erf affects translation termination by exploring the effect of hydroxylation on the structure, dynamics, stability, and binding of the n-terminal domain of erf (erf -n) using mass spectrometry, protein nmr spectroscopy, circular dichroism and differential scanning fluorimetry. in our efforts to understand the effect of hydroxylation, an additional jmjd -catalyzed modification, characterized by a da mass shift on k , was identified in vitro. the effect of this modification on erf was similarly explored. our findings suggest that hydroxylation has no effect on the in-solution nmr structure of erf -n, which experiences chemical shift changes localized to the target lysine residue. correspondingly, there are no significant differences in secondary structure content between wild type and hydroxylated erf -n. hydroxylation was also found to have no effect on protein stability or dynamics. interestingly however, the da modification appears to cause more significant chemical shift changes dispersed beyond the niks motif. this suggests a more global effect on the in-solution nmr structure despite the little differences observed in protein dynamics and secondary structure content. the da modification was also found to have a destabilizing effect on erf -n. neither hydroxylated nor da modified erf -n exhibited differences in rrna binding. while hydroxylation of erf was found to have little effect on protein structure, dynamics, stability, or binding, the da modification has marked effects on protein structure and stability. such differences suggest that this modification has the potential to play an important role in translation. functional and structural analysis of a gh ß-n-acetylglucosaminidase from the marine bacterium vibrio harveyi piyanat meekrathok , arthur t. porfetye , marco b€ urger , ingrid r. vetter , wipa suginta biochemistry-electrochemistry research unit, suranaree university of technology, max planck institute of molecular physiology vibrio harveyi b-n-acetylglucosaminidase (so-called vhglcnacase) is a new member of the gh glycoside hydrolase family responsible for the complete degradation of chitin fragments, with nacetylglucosamine (glcnac) monomers as the final products. however, the d structure of glcnacase is still unknown. in this study, crystal structure and function of glcnacase were investigated based on protein crystallography. size-exclusion chromatography and the native-page were employed to verify the protein state of glcnacase in a native form and the acidic active-site residues were mutated using sitedirected mutagenesis method. the effects of mutations on the binding and hydrolytic activities were studied by enzyme kinetics. to provide a structural basis of glcnacase, the wild-type enzyme was crystalized at k using a solution containing . m sodium acetate ph . and . m sodium malonate and recorded x-ray data. the wild-type enzyme was crystallized within days in the monoclinic crystal form, belonging to space group p , with unit-cell parameters a . , b . , c . Å. the crystal structures of v. harveyi glcnacase were solved and refined to highest resolution of . Å. structural investigation revealed that glcnacase comprises three distinct domains, designated as the n-terminal carbohydrate-binding domain, the a b topology domain and the tim-barrel catalytic domain. the substrate binding groove of glcnacase is a small pocket, which is suitable to accommodate a shortchain chitooligosaccharide. kinetic analysis revealed that a group of the adjacent d -h -e showed a significantly decreased activity as compared with the wild-type enzyme, and these residues might be important for enzyme catalysis. silencing the molecular timekeeper in human cancer alicia michael , stacy harvey , patrick sammons , amanda anderson , hema kopalle , alison banham , carrie partch university of california -santa cruz, university of oxford the circadian clock coordinates temporal control of physiology by regulating the expression of at least % of the genome on a daily basis. disruption of circadian rhythms through environmental stimuli (e.g. light at night) or genetic means can lead to the onset of diseases such as: diabetes, cardiovascular disease, premature aging and cancer. - the circadian clock orchestrates global changes in transcriptional regulation via the bhlh-pas transcription factor clock:bmal . pathways driven by other bhlh-pas transcription factors have a homologous repressor that modulates activity on a tissue-specific basis, but none have been identified for clock:bmal . we discovered that the cancer/testis antigen pasd fulfills this role to suppress circadian rhythms. pasd is evolutionarily related to clock and interacts with the clock:bmal complex to repress transcriptional activation. furthermore, deletion of one region, highly conserved with clock exon , alleviates repression by pasd to suggest that it utilizes molecular mimicry to interfere with clock:bmal function. structural and biochemical studies of the direct interaction of pasd with the clock:bmal complex using recombinant protein expression and biophysical techniques are currently underway. as a cancer/testis antigen, expression of pasd is natively restricted to gametogenic tissues but can be upregulated in somatic tissues as a consequence of oncogenic transformation. reducing pasd in human cancer cells significantly increases the amplitude of transcriptional oscillations to generate more robust circadian rhythms. our work suggests that mechanisms to suppress circadian cycling can be hard-wired in a tissue-specific manner and our data show that they can be co-opted in cancer cells to attenuate clock function. the scaffolding protein iqgap participates in various cellular functions such as cell-cell adhesion, cell polarization and migration, neuronal motility, and tumor cell invasion by binding to target proteins, including rac and cdc , two members of the rho family. to better understand the molecular basis of these interactions, we utilized in this study a novel time-resolved fluorescence spectroscopy to determine individual rate constants for iqgap interaction with fourteen different rho proteins. the results indicated that iqgap binds among rho proteins selectively to rac-and cdc -like proteins only in a gtp-dependent manner. moreover, the interaction of rho proteins with the c-terminal half of iqgap (grd-c), shorter fragment contains grd-gbd, only the grd and also grd-gbd with single and double phosphomimetic mutations s e and s d was performed. obtained results showed that, when both grd and gbd are existing, fluorescence changes is detected but for grd alone or in the case of s d or s e/s d no change was observed, suggesting that gbd and specifically, cysteine is critical for this interaction. furthermore, fluorescence polarization results showed that the grd-c interact with cdc and rac but not with rhoa, and interestingly the grd domain showed similar behavior, but with to folds lower affinity as compared with the grd-c. consistent with this, a gdp-bound form of cdc showed interaction with both grd and the grd-c in quiet comparable affinities. at last, competition experiments utilizing interacting partners of rac , e.g. tiam , p rhogap, plexin-b , p phox, pak and rhogdia, along with structural analysis, revealed two negative charged areas on the surface of rho-and rnd-like proteins, which might explain their inaccessible interaction with iqgap . the overlapping binding site of cdc and rac on the surface of iqgap together with the kinetic details of the selective interaction of iqgap with rac-and cdc -like proteins suggests that these interactions are most likely mediated via the same mechanism. ing dimerizes through its n-terminal domain, with a symmetric antiparallel coiled-coil structure , making it a bivalent reader of the h k me mark. ing is highly homologous with ing , but forms part of a different histone acetyl transferase complex . here, we show that ing is also a dimer and thus a bivalent reader of the h k me mark. however, the crystal structure of the n-terminal domain of ing shows an asymmetric dimer, different from the homologous ing domain. our nmr data (backbone assignment and paramagnetic relaxation effects) and saxs data indicate that the structure of the n-terminal domain of ing in solution is similar to ing , suggesting that the crystal structure of ing is likely a crystallization artifact. three point mutations in the n-terminal domain of ing have been described in oral squamous cell carcinoma: q r, i v, and c r . we have found that the n-terminal domains of the three mutants are dimeric coiled-coils but with different stability, as measured by thermal denaturation. while the q r mutant is as stable as the wild type, the i v and c r mutants are strongly destabilized, suggesting a role in cancer development at least for these two mutants. efforts so far, to combat alzheimer's disease (ad) have focused predominantly on inhibiting the activity of enzyme(s) that are responsible for the production of the main causative beta amyloid forming peptide. however, the inherent complexity associated with the network of pathways leading to the progress of the disease may involve additional targets for designing effective therapies. recent experimental findings have identified abelson's tyrosine kinase (c-abl), a non-receptor kinase involved in a variety of cellular functions as a new target for ad. in the present study we employed energy optimized multiple pharmacophore modeling strategy from multiple c-abl structures bound with ligands in the inactive atp binding conformation. virtual screening followed by docking of molecules from chembridge_cns database, and maybridge databases resulted in the identification of best scoring molecules. based on docking score and selectivity assessment and druggability parameters, four out of the molecules are predicted to show increased specificity for c-abl in comparison to closely related kinases. given the implied role of c-abl not only in ad but in parkinson's disease, the identified compounds may serve as leads to be developed as effective neurotherapeutics. rafael palomino , glenn millhauser , pietro sanna university of california santa cruz, the scripps research institute the central melanocortin system is recognized as a key regulator of energy balance and appetite. the hypothalamic melanocortin receptor, mc r, is a g-protein coupled receptor that is antagonized by the peptide ligand, agouti-related peptide (agrp), leading to increased feeding and weight gain. while much research has gone into how this ligand exerts its effects at the receptor, less is known regarding nonmelanocortin components of the pathway. syndecan- , a heparan sulfate proteoglycan, has previously been implicated in potentiating agrp antagonism, however details of this interaction are unclear. this work aims to investigate the role of syndecans at both a molecular level and in vivo. we hypothesize that agrp binds the glycosaminoglycan (gag) components of syndecans, and that this interaction increases the local concentration of the peptide near mc r. furthermore, we have previously shown that designed positive charge mutations to agrp lead to increased in vivo efficacy that is independent of mc r activity, and we hypothesize that this is due to greater affinity for the negatively charged gags. using isothermal titration calorimetry we have shown tight binding between agrp and heparan sulfate, the major gag component of syndecan- , and this affinity is strengthened by additional peptide positive charge. through nmr, we see that both positively charged and polar residues are necessary for binding various heparan sulfate polymers. these data implicate a specific region of agrp that is not required for mc r binding as being necessary in its role as a heparan sulfate binding protein. expanding on these findings, we are now using a syndecan knockout mouse line to explore the mechanism of differential feeding in our designed mutants. preliminary results indicate a reduction in weight gain in knockouts compared to their wildtype littermates post peptide administration. collectively, these data show that the physiologically relevant form of agrp, previously considered unable to interact with syndecans, is indeed a heparan sulfate binding protein. furthermore, our designed mutants have differential affinities for gags, with increased affinity correlating to increased feeding potency. finally, as the mc r pathway is thought to be a viable target for wasting disorders such as cachexia, we are interested in leveraging this data to improve the potency and stability of our designed agrp mutants. taken together, this work aims to develop new insights and probe the therapeutic potential of a critical metabolic pathway. evidence of a proteolytic phenomenon in the starch binding domain of the a-amylase from lactobacillus amylovorus zaira esmeralda s anchez cuapio , alejandra hern andez santoyo , sergio s anchez esquivel , romina rodr ıguez sanoja instituto de investigaciones biom edicas, universidad nacional aut onoma de m exico, instituto de qu ımica, universidad nacional aut onoma de m exico a-amylases are glycoside-hydrolases that catalyze the hydrolysis of internal a- , glycosidic bonds in starch and glycogen generating smaller oligosaccharides ( ). these multidomain proteins contain a catalytic barrel (b/a) and, in some cases, one or more non-catalytic domains whose function is generally described as carbohydrate binding module (cbm) and particularly as starch-binding domains (sbd). the sbd can bind granular starch increasing the local concentration of substrate at the active site of the enzyme and may also disrupt the structure of the starch surface ( ) . the a-amylase from lactobacillus amylovorus has a structure that consists of a catalytic domain (cd) and an unusual carboxy-terminal starch-binding domain with identical cbms (belonging to family ) in tandem ( ). each repeat acts as an independent fixing module with an additive or synergic effect between the units ( ). when we stored pure sbd from l. amylovorus we found multiple forms of low molecular weight with a constant pattern, which does not correspond to random degradation. interestingly, when the protein is stored at ph close to and edta is added, such proteolysis appears to decrease. so far, there is little information about the proteolytic process of amylases and the nature of it. here we show that divalent ions induce a proteolytic cleavage of the sbd, raising the possibility of an autoproteolytic activity. acknowledgments: this work is supported by grants papiit in - and conacyt . s anchez cuapio z is supported by a personal grant from consejo nacional de ciencia y tecnolog ıa, m exico. unnatural amino acid and related methods provided a special mechanism to implement site-specific spectroscopy active probe incorporation in a specific membrane protein in cells. the site specific incorporation resulted in a single signal during acquisition, resulting in unambiguous signal assignment. the protein specific labeling makes it possible for in situ membrane protein analysis using nmr or fluorescence detection. the f containing unnatural amino acid incorporation has been applied for dynamic studies of transporters in native lipid membrane, and the phosphorylation quantification analysis for tyrosine kinase in native lipid membrane with the aid of lipodisc. the fluorescent unnatural amino acid incorporation enabled the site-specific channel responses analysis upon ligand binding in a single cell. heather wiebe , noham weinberg , department of chemistry, simon fraser university, department of chemistry, university of the fraser valley the mechanism by which conformational changes, particularly folding and unfolding, occur in proteins and other biopolymers has been widely discussed in the literature. molecular dynamics (md) simulations of protein folding present a formidable challenge since these conformational changes occur on a time scale much longer than what can be afforded at the current level of computational technology. transition state (ts) theory offers a more economic description of kinetic properties of a reaction system by relating them to the properties of the ts, or for flexible systems, the ts ensemble (tse). the application of ts theory to protein folding is limited by ambiguity in the definition of the tse, although the experimentally observed first-order kinetics for folding of small single-domain proteins lends itself to interpretation by this theory. the pressure dependences of the folding rate constant can be used to obtain activation energies and activation volumes, which are rationalized as the properties of the folding tse. the large amount of activation volume data in the literature has gone largely uninterpreted at the quantitative level. we propose to utilize this data in conjunction with md-calculated volumetric properties to identify the tse for protein folding. the effect of pressure on reaction rates is expressed in terms of logarithmic pressure derivatives, known as activation volumes. according to ts theory, activation volumes can be identified as the difference in volume between the ts and reactant species: activation volumes dv ‡ have been experimentally determined for the folding of several proteins. the concept of activation volume can be extended to that of a volume profile, dv(y), which describes how the volume of a system changes along reaction coordinate y. if the position y ‡ of the ts along the reaction coordinate is unknown, it can be found by locating dv ‡ on the volume profile: such volume profiles can be built using our recently developed md-based displacement volume method.* using this method, volumes of single molecules can be calculated by taking the difference between the volume of pure solvent and solvent containing the desired solute. this method takes into account the strength and type of solvent-solute interactions as well as the geometrical configuration of the solute. in this work, we present the successful application of this method to several conformationally flexible systems. structure of the p paf/pcna complex and implications for clamp sliding on the dna during replication and repair to the canonical pip-box binding groove on the pcna front face. in contrast to other pcna interacting proteins, however, p paf also contacts the inside of, and passes through, the pcna ring. the mostly disordered p paf chain termini thus emerge at opposite faces of the ring, but remain protected from degradation by the s core proteasome. we also unveil a novel dna binding activity of p paf, both free and bound to pcna, which is mainly mediated by its conserved histone-like n-terminal tail. molecular modeling shows that a ternary complex with a duplex dna inside the pcna ring is energetically feasible and our electron micrographs show increased density inside the ring. we propose that p paf acts as a flexible drag that regulates pcna sliding along the dna, and may facilitate the switch from replicative to translesion synthesis polymerase binding upon dna damage. acknowledgements: this work has been mainly sponsored by mineco grant ctq - and juan de la cierva- contract to alfredo de biasio. metabolic syndrome (mets) is one of the leading causes of the death worldwide; however, exact pathophysiological mechanisms of mets remain largely unknown. growing evidence suggests that the increased availability of glucocorticoids at the tissue level play an important in mets development. one of the major determinants of glucocorticoid local action seems to be the enzyme b-hydroxysteroid dehydrogenase ( b-hsd ). this enzyme is a well-known member of the short-chain dehydrogenase/reductase (sdr) superfamily. it is an important carbonyl reducing enzyme that, besides its role fine-tuning of glucocorticoids actions, is involved in the biotransformation of drugs and in the development of lung cancer through metabolism of the tobacco specific carcinogen nnk. the phylogenetically closest relative of b-hsd is dhrs enzyme from the same superfamily. unlike b-hsd , dhrs is poorly characterized however it can be supposed at least partially overlapping function to b-hsd . moreover its possible association with similar pathological conditions in human as b-hsd has already been indicated by several studies. the aim of this study is the basic biochemical characterization of dhrs . the enzyme is a member of cluster of "classical" sdr; such members are considered to be retinoid and steroid metabolizing enzymes, so characterization the enzyme was based on this assumption. dhrs was prepared in recombinant form in the sf cell line. it was proved that this enzymes is an integral membrane-bound enzyme localized in the endoplasmic reticulum with luminal orientation, similarly to b-hsd . known substrates of b-hsd and related enzymes were tested also as substrates of dhrs . it was proved that dhrs is nadph-dependent reductase with important substrates as steroid hormones cortisone and androstene- , -dione, all-transretinal and also xenobiotics as , -naphtoquinone or carcinogen nnk at least in vitro. for better understanding of the catalytic function of dhrs its structural model was prepared and it is used also for the identification of additional substrates by ligand virtual screening. dhrs enzyme is expressed in several human tissues as adrenals, liver, prostate, small intestine and kidney. these brand new initial results point to the possible involvement of dhrs in important cellular processes that deserve further investigation. these results will lay the foundation for an understanding of dhrs role in human physiology resp. pathophysiology. this project was supported by grant agency of charles university ( /c/ ) and unce / ). structure-based functional identification of helicobacter pylori hp as a nuclease with both dna nicking and rnase activities bong-jin lee , ki-young lee hp is a conserved, uncharacterized protein from helicobacter pylori. here, we determined the solution structure of hp using three-dimensional nuclear magnetic resonance (nmr) spectroscopy, revealing that this protein is structurally most similar to a small muts-related (smr) domain that exhibits nicking endonuclease activity. we also demonstrated for the first time that hp is a nicking endonuclease and a purine-specific ribonuclease through gel electrophoresis and fluorescence spectroscopy. the nuclease activities for dna and rna were maximally increased by mn( ) and mg( ) ions, respectively, and decreased by cu ( ) ions. using nmr chemical shift perturbations, the metal and nucleotide binding sites of hp were determined to be spatially divided but close to each other. the lysine residues (lys , lys and lys ) are clustered and form the nucleotide binding site. moreover, site-directed mutagenesis was used to define the catalytic active site of hp , revealing that this site contains two acidic residues, asp and glu , in the metal binding site. the nucleotide binding and active sites are not conserved in the structural homologues of hp . this study will contribute to improving our understanding of the structure and functionality of a wide spectrum of nucleases. high-fidelity recombinant protein production in a silkworm bioreactor sungjo park , in-wook hwang , tatsuya kato , enoch park , andre terzic center for regenerative medicine, mayo clinic, laboratory of biotechnology, shizuoka university the domesticated silkworm, bombyx mori, is an attractive host naturally equipped with a proficient posttranslational modification machinery adequate to fulfill stringent demands of authentic recombinant protein production. silkworm-based protein expression has originally relied on a prototype baculovirus vector system that employs silkworm as a bioreactor in place of more traditional cell lines. recent development of the silkworm trophic b. mori nucleopolyhedrovirus (bmnpv) bacmid launches a second generation of silkworm-based protein production technology. introducing the recombinant bacmid dna into silkworms expedites heterologous protein expression by eliminating prior virus construction and amplification steps. salient examples of heterologous eukaryotic proteins produced in silkworms are acetyl-coa carboxylase , malonyl-coa decarboxylase, spot /mig heterodimer and a , -sialyltransferase with consistent high levels of protein expression. thus, equipped with a fail-safe post-translational modification machinery, eukaryotic proteins are readily bioengineered using a silkworm-based protein expression platform. studies exploring potential applications of synthetic antifreeze proteins in the frozen food industry ho zee (charles) kong , conrad perera , ivanhoe leung , nazimah hamid , viji sarojini school of chemical sciences, the university of auckland., school of applied sciences, auckland university of technology in nature, certain species of plants, insects and fish produce a group of antifreeze glycoproteins and polypeptides which enable them to survive the freezing temperatures of their natural habitat. these naturally occurring antifreeze proteins (afps) were first discovered in polar fishes such as antarctic notothenioids and winter flounder. these afps have the ability to bind to ice crystals and restrict their size and morphology; decrease the freezing point of water and inhibit the ice recrystallization processes. ice crystal formation is of primary concern to the frozen food industry, as ice crystal formation during freezing can be disruptive to and cause damage to the cellular structures in food. the unique properties of afps can be developed into a potential solution to minimize freeze-thaw damage to frozen food. a number of tailor made synthetic analogues based on the naturally occurring afps were successfully designed and synthesized. antifreeze activity studies of the afps were carried out using the clifton nanoliter osmometer attached with a microscope. the afps exhibited thermal hysteresis as well as modification of ice crystal morphology, confirming their antifreeze activity in vitro. the ability of these synthetic afps in preserving the texture and structure of frozen food was evaluated using the techniques of scanning electron microscopy. the afps showed great potential to preserve the cellular structures of frozen food samples during freeze-thaw process. additionally, secondary structure analysis of the afps was carried out using circular dichroism. this presentation will summarize our current results on the design, synthesis and anti-freeze activity analysis of the synthetic afps. invasive fungal infections remain a leading cause of death in immunocompromised patients. current antifungal agents have a host of issues including limited efficacy, host toxicity and an alarming increase in resistance. current research in our laboratories is focused on targeting the calcineurin signaling pathway that has been shown to be required for fungal pathogenesis. calcineurin is a highly conserved serine-threonine-specific ca -calmodulin-activated phosphatase important in mediating fungal pathogenesis and stress responses. it is a key regulator of a signal transduction network required for survival of the most common pathogenic fungi in humans, making it an ideal target for fungal drug development. calcineurin is a heterodimer of a catalytic (a) and regulatory (b) subunit. phosphatase activity requires association of the two subunits. calcineurin is also the target of the immunosuppressant fk , which functions as an inhibitor by first complexing with the peptidyl-prolyl cis-trans isomerase immunophilin, fkbp . the fkbp -fk complex subsequently binds to calcineurin in a groove between the a and b subunits and inhibits its activity. although fungal calcineurins are targeted by fk , it also targets mammalian calcineurin and is thus immunosuppressive in the host. in order to improve therapeutic efficacy, we have undertaken a unique effort that utilizes both structural biology and molecular mycology in an effort to overcome the fungal versus human specificity barrier. the nmr studies to be presented here have been focused on determining the resonance assignments and solution structures for the fkbp proteins from the pathogenic fungi candida albicans, candida glabrata and aspergillus fumigatus. notably, the x-ray crystallography structures of the wild-type candida albicans and aspergillus fumigatus fkbp proteins revealed an intriguing intermolecular interaction involving four residues in the 's loop including pro (in c. albicans) and pro (in a. fumigatus) which are stabilized in the cis conformation. these data suggest that the protein might use itself as an enzyme substrate. in efforts to establish if this interaction remains in a solution environment, we have determined the nmr structure and measured the t relaxation rates for the wild-type a. fumigatus fkbp protein and for the p g mutant variant that adopts a dramatically different orientation of the 's loop and does not form an intermolecular interaction in the crystal structure. the nmr chemical shift data indicate that, while the remainder of the protein structure remains unchanged, the 's loops in the two variants are indeed different. in addition, the t relaxation rates of the residues in this region are dramatically dissimilar in the two variants, but remain identical throughout the rest of the protein. we have also begun inhibitor binding studies of all of the fkbp proteins from each of the pathogens by titrating the fk inhibitor into native and mutant fkbp proteins in order to examine conformational changes associated in the protein upon complex formation. using this approach we plan to determine the relative kd values for binding of each inhibitor to the fkbp protein from each pathogen for comparison of binding proclivities. lupin (lupinus angustifolius l.) b-conglutin proteins: structure functional features, catalytic mechanism modeling and cross-allergenicity identification using protein threading and molecular docking methods lupin is an important pulse, which displays a wide range of benefits in agriculture, particularly these involved in possible plant pathogen suppression. furthermore, lupin seed proteins promote different positive health aspects, preventing cardiovascular disease, and reduction of glucose and cholesterol blood levels. "sweet lupine" seeds seem to be promising as a source of innovative food ingredients due to averaged protein content similar to soybean and an adequate composition of essential amino acids. thus, lupin seeds may be important source of proteins for human and animal consumption. however, and as drawback feature, the number of allergic people to lupin seed proteins is rising, becoming a serious and a growing problem in the western world, because of the rapid introduction of lupin seeds as new ingredients in traditional and novel foods. the goals of this study are the characterization the structure-functional properties of lupinus angustifolius l or narrow leafed lupin (nll) b-conglutin proteins, with a focus in its catalytic mechanism, and its molecular cross-allergenicity with other legumes, i.e. peanut, by extensive analysis using different computer-aided molecular approaches covering (i) physicochemical properties and functional-regulatory motifs, (ii) sequence analysis, -d and d structural (threading) modeling comparative study and molecular docking, (iii) conservational and evolutionary analysis, (iv) catalytic mechanism modeling, and (v) sequence, structure-docking based b-cell epitopes prediction, while t-cell epitopes were predicted by inhibitory concentration and binding score methods. b-conglutins (vicilin-like or s proteins) are seed proteins typically found in reserve tissues (endosperm and cotyledon). they belong to the cupin superfamily of proteins, containing a globular domain constituted by a conserved b-barrel. two barrels were found in all b-conglutin protein isoforms and an additional mobile n-terminal arm constituted bye a-helices. molecular modeling analysis has shown that one of this barrel contain a semi-conserved metal binding motive (hyx. . .r), typically found in oxalate oxidase (oxox) enzymes. interestingly, our results revealed considerable structural differences between b-conglutin isoforms, particularly affecting -d elements (loops and coils), and numerous micro-heterogeneities are present in fundamental residues directly involved in epitopes variability, which might be a major contributor to the observed differences in cross-reactivity among legumes. we also identified multiple forms of b-conglutins polypeptides ranging from - kda, with ige-binding characteristics in atopic patients. thus, b-conglutins might be considered as major allergen in different species of lupin, including the "sweet lupin" group, since several of these polypeptides were recognized by human iges, having the potential to trigger an immune response leading to allergy symptoms. influenza virus is one of the most prevalent pathogens causing respiratory illness which often leads to serious post influenza complications such as pneumonia and myocarditis. some viruses, as the avian influenza h n , are especially dangerous and draw special attention of who. this highly pathogenic virus spreads quickly among domestic poultry and wild birds resulting in high mortality. what is more distressing, the h n virus may be transmitted to humans. because of antigenic drift it is impossible to deliver an effective vaccine against all subtypes of the h n virus. moreover, traditional egg-based production of influenza vaccines is time-and cost-consuming, what makes it inadequate in case of a pandemic. hence, we have developed an efficient production process of influenza vaccine based on a recombinant hemagglutinin antigen (rha). recombinant vaccines underlay strict regulations and quality requirements. the purpose of this work was to develop a battery of analytical methods that allow to evaluate key quality attributes of rha on each stage of production. at first, we have focused on rha structure as a crucial issue for its activity. the primary structure of rha was confirmed by peptide mapping and tof/tof fragmentation (hplc, maldi tof/tof). furthermore, ftir analysis was used to evaluate the secondary structure of the protein. the disulfide bonds, which stabilize the tertiary structure, were assigned by peptide mapping. additionally, free thiols were measured using ellman's reagent. moreover, we have employed rp-hplc, sec-mals and dls to explore oligomerization of rha. these techniques appeared to be useful not only to confirm existence of native oligomers, but also to find and discard misfolded fraction, aggregates and truncated forms. in addition, two analytical methods (rp-hplc and cge) were developed to assess the purity of rha as required by ich guidelines. we also have determined isoelectric point and heterogeneity of rha by cief. afterward, developed methods were applied in the stability studies that provide a valuable insight into a chemical degradation process and conformational changes of rha during storage. this work was supported by innovative economy operational program, grant no. wnd-poig. . . - - / - as a part of project "centre of medicinal product biotechnology. package of innovative biopharmaceuticals for human and animal therapy and prophylactics." muscle cell atrophy via hsp gene silencing was counteracted by celastrol-mediated hsp overexpression molecular chaperone heat shock proteins (hsp) are known to assist protein quality control under various stresses. although overexpression of hsp was found to promote muscle mass retention in an unloading state, it is unclear whether muscle atrophy is induced by suppression of hsp expression and is counteracted by active hsp overexpression. in this study, we pre-treated hsp sirna to rat l cells for the hsp gene-silencing, and determined myotube diameter, hsp expression and anabolic and catabolic signaling activities in the absence or presence of triterpene celastrol (cel), the hsp inducer. relative to a negative control (nc), muscle cell diameter was reduced by % in the sirna-treated group, increased . -fold in the cel-treated group and remained at the size of nc in the sirna cel group. hsp expression was decreased % by sirna whereas the level was increased -to -fold in the cel and sirna cel groups. expression of foxo and atrogin- was increased . -to . -fold by sirna, which was abolished by cel treatment. finally, phosphorylation of akt , s k and erk / was not affected by sirna, but was elevated -to -fold in the cel and sirna cel groups. these results suggest that hsp downregulation by hsp gene-silencing led to muscle cell atrophy principally via elevation of catabolic activities. such anti-atrophic effect was counteracted by cel-mediated hsp overexpression. the centers for disease control and prevention report that at least million people in the united states will become ill due to antibiotic resistant pathogens leading to , deaths each year. in order to circumvent these resistance mechanisms, it is essential to quantitatively understand how the function of the protein(s) involved relates directly to resistance. integral membrane efflux pumps are known determinants of single-drug and multi-drug resistance in a wide variety of pathogenic organisms. these transporters are proteins whose characterization typically requires reconstitution in an artificial membrane. subsequently, these important proteins are difficult to characterize by traditional in vitro studies. my project aims to determine the physicochemical parameters of the efflux pump tetb utilizing molecular biology and mathematical modeling. tetb is composed of transmembrane (tm) alpha-helices and is found within the inner membrane of gram-negative bacteria. this protein allows for the efflux of tetracycline (tet), doxycycline (dox), and minocycline (mcn) antibiotics from the cytoplasm into the periplasm. these tetracyclines are a bacteriostatic class of antibiotics that inhibit protein synthesis by binding to the s ribosomal, therefore, blocking the binding of aminoacyl-trna. for cells grown in tetracyclines, the efflux mechanism of tetb decreases the cytosolic antibiotic concentration allowing for the rate of protein translation to increase. i have inserted a tet(b) expression system into the chromosome of an escherichia coli lab strain and have determined its growth profile under various concentrations of tet, mcn, and dox using a high-throughput -well plate format. the growth rate profiles correlate with tetb pumping rates for each drug. tetb more readily pumps out tet compared with dox and mcn and we observe that cells expressing tetb can grow at higher tet concentrations compared with dox and mcn. the shapes of the growth rate profiles produced in the different drugs give insight into the physicochemical mechanism of tetb. we have built a preliminary mathematical model that can simulate these growth profiles and predict efflux pump physicochemical parameters. we are currently working on understanding how efflux expression effects bacterial growth by testing ribosome binding site (rbs) sequences of varying strengths in our tet(b) expression system. future work is geared toward modeling more complex efflux pumps such as the tripartite pumps which traverse both bacterial membranes and cause multi-drug resistance. collectively, this project aims to build an in vivo system which will allow for the characterization of a variety of efflux pumps without the arduous tasks of protein purification and subsequent reconstitution. ( ) identified a small transmembrane region of both kcne and kcne that are essential for their unique modulation of the kcnq channel. by swapping a triplet motif in the transmembrane region of kcne and kcne , we can flip the primary function of these two proteins. while the key for kcne and kcne 's unique modulating is believed to lie in this triplet motif, the mechanism and structural changes involved in this modulation is not fully understood. by using nmr spectroscopy, biochemical studies, and computational docking, we aim to look at the structural and conformational differences between kcne and the triple mutant kcne substituted with the three essential kcne residues. we have expressed and purified n-labled kcne triple-mutant in sufficient quantities for nmr studies in lmpg detergent micelles and other membrane mimetics, and we have collected d nmr spectra using a trosy-based pulse sequence. partial backbone assignments of kcne triple mutant have been determined by aligning and transfer assignments of the wt kcne previous determined in our lab. with the structure of kcne triple mutant determined, we aim to computationally dock the triple mutant into a model of the full-length kcnq channel in the open and closed state. lastly, we will compare the known structure of kcne docked to a model of kcnq to that of the kcne triple mutant to determine key interactions, significant structural and conformational changes, and how the triple motif region gives rise to its specific structural and functional differences. with this information, we can begin to understand the mechanism of the functional diversity of the kcne family on kcnq potassium channel. biochemical characterization of brassica napus diacylglycerol acyltransferase and its regulatory domain .a) expressed in saccharomyces cerevisiae. purified bnadgat in n-dodecyl-b-d-maltopyranoside (ddm) micelles behaves as dimers, which can associate further to form tetramers. the acyl donor preference of the major dimeric form with sn- , -diolein as acceptor follows the following order: a-linolenoyl-coa > oleoyl-coa palmitoyl-coa > linoleoyl-coa > stearoyl-coa. the first residues of bnac.dg-at .a corresponding to a soluble regulatory region was expressed in escherichia coli and purified. truncation of this soluble domain reveals that the dimeric interface is located within residues - , while the first residues allow formation of tetramers. this n-terminal region was implicated as an allosteric exosite for acyl-coas as revealed by previous lipidex- binding studies. in the current study, circular dichroism spectroscopy and isothermal titration calorimetry were used to probe the binding kinetics and thermodynamics. dgat appears to shift between two oligomerization states, a phenomenon that may be related to regulation of enzyme activity and mediated by the n-terminal domain. alteration of lysine and arginine content as a strategy to modify such an interaction was found to increase the activity of rdrp in vitro. further, deletion of c terminal amino acid residues also resulted in increase in the polymerase activity that was comparable to the full length rdrp-p complex. it was proposed that the conserved c terminal disordered domain of rdrp was responsible for interaction with p and modulation of the activity. in the present study, role of the c terminal disordered domain was further investigated by determining the oligomeric status of the complex and the c terminal deletion mutants of rdrp and also by quantitating the rdrp-p interaction using surface plasmon resonance. size exclusion chromatography revealed that rdrp eluted in the void volume of the column whereas a significant fraction of the rdrp-p complex eluted at a position corresponding to the size of the : complex of rdrp and p ( kda). activity measurements indicated that the heterodimeric complex was more active than the aggregate eluting in the void fraction. interestingly, the c terminal deletion mutants of rdrp (c del & c del rdrp) were also found to be less aggregated as compared to full length rdrp and some of the protein eluted at a position corresponding to the respective monomers. these monomers were also more active than the aggregate fractions. these results demonstrate that the increase in activity observed either upon interaction with p or deletion of the c terminal domain could be due to the change in the oligomeric state of rdrp. in order to further analyze the interaction of rdrp with p surface plasmon resonance was used. rdrp and its deletion mutants were immobilized on biacore sensor surface and p protein was used as an analyte. full length rdrp and c del rdrp were shown to interact with p with kd values of . and um respectively. however, c del and c del rdrp did not show any binding with p . these results suggest that the region - from the c terminus of rdrp is essential for the interaction with p . further, the c del rdrp was inactive although c del rdrp continued to be active suggesting that residues - from the c terminus are crucial for rdrp activity. further studies are in progress to identify the residues within these motifs that may be essential for the activity or interaction with p . aggregation of androgen receptor in spinal bulbar muscular atrophy is a multistep process spinal bulbar muscular atrophy (sbma) is a member of the polyglutamine (polyq) expansion diseases, like huntington disease, and it is caused by a genetic expansion of the polycag tract in exon of androgen receptor (ar) that codes for the polyq region. sbma is a late onset disease, which involves a progressive degeneration of the motor neurons and consequent muscular atrophy. there is still no treatment available for this disease. ar is a nuclear receptor that responds to testosterone and that regulates the expression of the masculine phenotype. it is composed of an intrinsically disordered nterminal domain (ntd) that bears the polyq tract, a dna binding domain and a ligand binding domain. aggregates of ar protein with an extended polyq are observed in the motor neurons of sbma patients. in vitro studies showed that aggregation of androgen receptor takes place only in presence of testosterone and that the cleavage of the protein by caspase is a crucial event for cytotoxicity. however, there is no clear knowledge of the mechanism of aggregation, for this protein. an increasing body of evidence supports the hypothesis that the aggregation of these proteins is controlled by regions flanking the polyq tract, by regulating the rate of aggregation depending on their secondary structure. we have applied nuclear magnetic resonance (nmr) and circular dichroism for generating information on the secondary structure of the n-terminal cleavage product of ar by caspase and we have studied its aggregation with a set of biophysical methods, like dynamic light scattering, an hplc sedimentation assay and transmission electron microscopy. we have found that the polyq tract of ar presents a high degree of helicity. we attribute this conformation to the n-terminal flanking region, characterized by high helicity and we have tested this hypothesis by performing mutations. we have also observed that the rate of the first step of oligomerization is not dependent on the number of glutamine repeats, but instead is due to self interactions of a region n-terminal to and far from the polyq. its progression to fibril is dependent to the number of glutamines in the tract. we have therefore identified two steps in the aggregation process of ar, where a motif far from the polyq at its n-terminal drives the early oligomerization, followed by the interaction of the polyq chains that stabilize it and determine the progression to fibrils. these findings shed a light for possible interventions on the ar oligomerization process, thus suggesting a different strategy to study the onset of the disease in sbma patients. destabilizing the transient helical conformation of islet amyloid polypeptide hastens peptide self-assembly and potentiates cytotoxicity carole anne de carufel , phuong trang nguyen , alexandre arnold , isabelle marcotte , steve bourgault amyloidogenic polypeptides can be divided into two different structural classes: those that are intrinsically disordered and those that show a well-defined structure in their monomeric soluble state. natively folded proteins, such as transthyretin, have to unfold (or misfold), at least partially, to form amyloids. in contrast, intrinsically disordered polypeptides, such as the islet amyloid polypeptides (iapp) and abeta peptide, need to undergo conformational rearrangements allowing the formation of locally ordered structure(s) to initiate the amyloidogenic process. studies have shown that iapp and abeta adopt an alphahelix conformation in the initial steps of amyloidogenesis. this intermediate is believed to be on-pathway to fibril formation, although this hypothesis is still the matter of debate. in this study, we designed human iapp (hiapp) derivatives in which alpha-helix destabilizing substitutions were incorporated into the putative helical segment of iapp to probe the initial structural event in amyloid formation. using trifluoroethanol titration, we observed by cd spectroscopy that strategic incorporation of d-amino acids at positions and leads to an iapp derivative (diapp) that cannot fold into a helix. in homogeneous solution, hiapp and diapp show similar kinetics of fibrillization, as measured by thioflavin t fluorescence. although their amyloid fibrils display different characteristics by afm, iapp and diapp are able to self-associate to form amyloids when mixed together and when seeded with one another. studies in heterogeneous environment, notably in presence of glycosaminoglycans and model membranes of dopc/dopg ( : ), showed a helical intermediate for hiapp while only a beta-sheet secondary structure was apparent for diapp. while the rate of amyloid fibril formation was increased for both peptides, diapp was drastically affected by these anionic biomolecules with an absence of lag phase. the incapacity of adopting a transient helical conformation accentuates cell toxicity, supported by the caspase / activation level and the increase in intracellular calcium level. overall, this study indicates that the helical intermediate is offpathway to iapp amyloid formation and offers novel mechanistic insights for the development of molecular identities modulating peptide self-assembly and iapp-induced cytotoxicity. for an organism to survive, its proteins must adopt complex conformations in a challenging environment where macromolecular crowding can derail even robust biological pathways. the situation is perilous: many diseases arise from improper folding of just a single protein. to cope, cells employ a repertoire of molecular chaperones and remodeling factors that usher unfolded proteins into active conformations, sequester them, or target them for degradation. yet, not all aggregated proteins are the result of mis-folding. yeast prions are self-templating protein-based mechanisms of inheritance that rely upon chaperones for their propagation. the best studied of these is the prion domain (nm) of sup , which forms an amyloid that can adopt several distinct conformations (strains) that produce distinct phenotypes. using genetic, biochemical, spectroscopic, and solid state nmr techniques, we investigated the structural and dynamic underpinnings of sup amyloids and found that prion strains differ in both their atomic structure as well as their dynamic motions. interestingly, these mobility differences correlate with differences in the interaction with molecular chaperones in vivo. limitations on the specificity and sensitivity of biophysical techniques typically restrict structural investigations to purified systems at concentrations that are orders of magnitude above endogenous levels. therefore, i developed an approach to apply a sensitivity-enhancement technique for nmr, dynamic nuclear polarization (dnp), to investigate interactions between sup and molecular chaperones at endogenous concentrations in their native environments. critically, i found that the cellular environment induced structural changes in a region of sup that is intrinsically disordered in purified samples but known genetically to influence prion propagation from one generation to the next. this approach enables structural and mechanistic investigation of proteins in biologically relevant contexts. genetic instability within regions encoding repetitive proteins as a driver of adaptation stephen fuchs more than ten percent of all eukaryotic proteins contain within them a region of repetitive amino acid sequence. these repetitive domains range from short stretches of a single amino acid to multiple copies of longer, heterogeneous amino acid sequences and generally show lack of defined structure. they play diverse roles in cells including acting as structural proteins, promoting cell-cell interactions, and mediating the assembly of molecular machines. tandem repeat proteins are known to be variable in length within cellular populations although the mechanisms dictating this variability have not been elucidated. here we describe work uncovering specific features within the coding sequences of repetitive proteins that contribute to tandem repeat instability in yeast. furthermore, we demonstrate that cells will expand and/or contract repetitive regions in order to adapt to environmental stresses and describe a role for dna repair proteins in this process. lastly, we demonstrate how these mechanisms are likely conserved in higher eukaryotes, including humans. this study uncovers the molecular basis for an important aspect of natural protein evolution and describes a novel mechanism for adaptation in response to environmental changes. a proline-tryptophan turn in the intrinsically disordered domain of ns a protein is essential for hepatitis hepatitis c virus (hcv) nonstructural protein a (ns a) and its interaction with the human chaperone cyclophilin a (cypa), a peptidyl-prolyl cis-trans isomerase (ppiase), are both targets for highly potent and promising antiviral drugs that are in late stage of clinical development [ , ] . despite its high interest in the development of drugs to counteract the worldwide hcv burden, ns a is still an enigmatic multifunctional protein poorly characterized at the molecular level. ns a is required for hcv rna replication and is involved in viral particles formation and regulation of host pathways. thus far, no enzymatic activity or precise molecular function has been ascribed to ns a that is composed of a highly structured domain (-d ), as well as two intrinsically disordered domains (-d ) and (-d ). ns a-d structure has been solved by x-ray crystallography and ns a-d and -d have been characterized by nmr spectroscopy. these two last domains do not adopt a stable d structure but rather exist as an ensemble of highly dynamic conformers. using nmr spectroscopy, hcv ns a-d has been shown to establish a direct interaction with the human cypa and to be a substrate for the enzymatic ppiase activity of cypa [ ] . the cypa interaction site in ns a-d is composed of nearly residues that correspond to the most conserved region of the domain, with proline residues being strictly conserved among all hcv genotypes. whereas ns a-d is mainly disordered, some of its nmr resonances, corresponding to residues in the cypa binding site, display unexpected h and n nmr chemical shifts for an intrinsically disordered domain. thus we have further characterized this region by nmr spectroscopy. a short structural motif in the disordered ns a-d has been identified and we solved its nmr structure. in a cellular assay, we showed that this structural motif, a minimal pro -trp turn, is essential for hcv rna replication. we demonstrated that this pro-trp (pw) turn is required for proper interaction with the host cypa and influenced its enzymatic ppiase activity on residue p of ns a-d . this work provides a molecular basis for further understanding of the function of the intrinsically disordered domain of hcv ns a protein. in addition, our work highlights how very small structural motifs present in intrinsically disordered proteins can exert a specific function. [ ] [ ] . this -residue peptide also shows toxicity towards mammalian cells but at higher concentrations, suggesting its possible usefulness as a treatment for trypanosomiasis. here we present the peptide's relative cytotoxicity for bloodstream and procyclic forms of t. brucei and for mammalian cells, the fate of the peptide in t. brucei using fluorescently-labelled bt- , and its three dimensional structure using nmr spectroscopy.minimum inhibitory assays confirmed the peptide's selective toxicity towards both bloodstream and procyclic forms of t. brucei, demonstrating its potential to serve as a starting point for a trypanocidal drug. fluorescence spectrophotometric experiments, carried out using fluorescein labelled bt- , show that the peptide is released from the external surface of the parasite into the suspending medium under de-energized conditions but retained in energized cells. heteronuclear and homonuclear biomolecular nmr experiments (tocsy, noesy, h- c-hsqc, h- n-hsqc, etc) folowed by structural calculations (chemical-shift based as well as simulated annealing techniques) in the free state indicate that this peptide is mostly unstructured in aqueous solution, suggesting that there is a major conformational change upon binding to t. brucei that is required for uptake. we suggest that the evolutionary pressure that selected for the intrinsically disordered structure of this peptide was the advantage it conferred upon the host to bind to many different surface structures throughout the microbiological world. physikalische biologie, heinrich heine university, structural biochemistry (ics- ), research centre j€ ulich, chemistry and biotechnology, swedish university of agricultural sciences (slu) the misfolding and amyloid formation of proteins featuring intrinsically disordered regions is a pathological hallmark of several neurodegenerative diseases, including alzheimer's disease and parkinson's disease. engineered binding proteins targeting amyloidogenic proteins aid in the elucidation of the aggregation mechanism and suggest therapeutic strategies. we have constructed phage display libraries enriched in binders to amyloidogenic intrinsically disordered proteins, using zab , a protein with high affinity for the amyloid-beta peptide, as a scaffold. binding proteins selected from these libraries are termed beta-wrapins (beta-wrap proteins). the beta-wrapins as and hi exhibit nanomolar affinity for monomeric alpha-synuclein or islet amyloid polypeptide, respectively. as and hi potently inhibit in vitro amyloid formation and toxicity at substoichiometric concentration ratios, indicating that they interfere with the nucleation and/or elongation of amyloid fibrils. the nmr structures of the betawrapin:target complexes reveal beta-hairpin motifs in alpha-synuclein and islet amyloid polypeptide which are stabilized by coupled folding and binding. in the case of alpha-synuclein, the beta-hairpin is formed in the sequence region - which contains the beta-strand segments b and b of amyloid fibril models and most disease-related mutations. we show by disulfide engineering, biophysical techniques, and cell viability assays that intramolecular tertiary interactions between the b and b segments of alpha-synuclein interfere with its aggregation, and moreover inhibit aggregation of amyloid-beta peptide and islet amyloid polypeptide. our results reveal a common preference of different amyloidogenic proteins for formation of beta-hairpin motifs and demonstrate a critical role of hairpin conformers in the control of amyloid formation. interaction profiling through proteomic peptide phage display cecilia blikstad , moon-hyeong seo , norman davey , roland arnold , sachdev s sidhu , philip m kim , ylva ivarsson department of chemistry -bmc, donnelly centre a considerable part of the human proteome is intrinsically disordered. the disordered regions are enriched in short motifs serving as docking sites for peptide binding domains. domain-motif interactions are crucial for the wiring of signaling pathways. these interactions are typically transient and difficult to capture through most conventional high-throughput methods. we therefore developed a novel approach for the large-scale profiling of domain-motifs interactions called proteomic peptide phage display (prop-pd) ( ). in prop-pd we combine bioinformatics, oligonucleotide arrays, peptide phage display and next-generation sequencing. this allows the interrogation of domain-motif interactions on a proteome-wide scale and the de novo motif discovery.in our pilot experiment we generated two distinct phage libraries, one displaying all human c-terminal sequences and one displaying c-termini of known virus proteins. we used the prop-pd libraries to identify interactions of human postsynaptic density /discs large/zonula occludens- (pdz) domains. we successfully identified novel pdz domain interactions of potential relevance to cellular signaling pathways and validated a subset of interactions with a high success rate. recently, we created a prop-pd library that displays peptides representing the disordered regions of the human proteome. we validate our disorderome library against a range of peptide binding domains, which provides novel insights into their binding preferences and suggest interactions of potential biological relevance as will be presented here. prop-pd can be used to uncover protein-protein interactions of potential biological relevance in high-throughput experiments and provides information that is complementary to other methods. prop-pd is scalable and can be developed to any target proteome of interest. phosducin is a kda phosphoprotein that regulates visual signal transduction by interacting with the gtbg; subunit of the retinal g-protein transducin. the function of pdc is regulated by phosphorylation at ser and ser in a process that involves the binding of phosphorylated pdc to the regulatory - - protein, but the molecular mechanism of the regulation by - - protein is still unknown. pdc was also suggested to be involved in transcriptional control, the regulation of transmission at the photoreceptorto-on-bipolar cell synapse, and the regulation of the sympathetic activity and blood pressure [ ] [ ] [ ] . here, the solution structure of pdc and its interaction with the - - protein were investigated using small angle x-ray scattering, circular dichroism, quenching of tryptophan fluorescence, analytical ultracentrifugation, hydrogen-deuterium exchange coupled to mass spectrometry and nuclear magnetic resonance. we show that the - - protein interacts with and sterically occludes both the n-and c-terminal gtbg binding interfaces of phosphorylated pdc, thus providing a mechanistic explanation for the - - depedent inhibition of pdc function. the - - protein dimer interacts with pdc using surfaces both inside and outside its central channel. the n-terminal domain of pdc, where both phosphorylation sites and the - - binding motifs are located, is intrinsically disordered protein which remains likely highly flexible when bound to - - indicating the fuzzy-like character of this complex. in addition, it has been speculated that the - - protein binding decreases the rate of pdc dephosphorylation after a light stimulus through its interaction with phosphorylated ser and ser , thus lengthening the time that pdc remains phosphorylated after a light exposure. pdc is dephosphorylated in vivo by protein phosphatases known to cause neurodegenerative disease in a polyglutamine-length dependent manner. despite intense study, the molecular basis of polyq toxicity in hd or any of the other diseases has only partially been elucidated and potential routes to therapeutic intervention are sparse. the use of genetically tractable model organisms to identify the cellular pathologies caused by mutant huntingtin expression is essential to our understanding of the disease pathology in humans. in eukaryotes, many of the protein folding homeostasis pathways are highly conserved and yeast cells expressing a glutamine-expanded fragment of huntingtin exon exhibit a polyq length-dependent toxicity that recapitulates many of the basic protein folding defects associated with polyq diseases in neurons. taking an unbiased approach, we screened an overexpression library of the entire yeast genome for suppressors and enhancers of polyq toxicity and identified seven proteins with prion-like, q-rich domains that are strong suppressors in yeast. intriguingly, the q-rich domains of these proteins, and several other q-rich domains, suppress toxicity when expressed in isolation. these suppressors are also efficacious in mammalian cells and, strikingly, one suppressor was independently shown to alleviate polyq-expanded ataxin- toxicity in a drosophila model. in yeast, the suppressors co-aggregated with an otherwise highly toxic glutamine expanded huntingtin exon protein (htt q), resulting in a non-toxic aggregate and eliminating populations of diffusible oligomeric species. using a transcriptional sensor for protein coaggregation, we determined that yeast and human proteins that normally co-aggregated with htt q did not co-aggregate with these hetero-aggregates. thus, these q-rich domains may suppress htt q toxicity by two complementary mechanisms: trapping potentially toxic oligomers in larger aggregates and by limiting the interactome of the larger htt q aggregates. structuring disorder: the case of the intrinsically disordered unique domain of c-src mariano maffei about two thirds of eukaryotic proteins contain large intrinsically disordered regions. they represent a change of paradigm from "structure-function" to "information-function" (uversky, ; babu et al., ). structured proteins are information rich, but the current challenge is to discover how information is stored in disordered protein. regulation of c-src activity, the first discovered oncoprotein, by its intrinsically disordered n-terminal region has been recently demonstrated (perez et al., ). functional studies have revealed that mutations in the ulbr cause strong phenotypes when introduced in fulllength c-src and expressed in xenopus laevis oocytes (perez et al., ) or in human sw colorectal cancer cells (unpublished). however, the connection with the classical regulatory mechanisms is still missing. c-src domain structure consists of four "src-homology" domains: sh , sh , sh and sh , arranged in this order from the n-terminus to the c-terminus, with the intrinsically disordered "unique" domain separating the sh and sh domains. classically, the sh and sh domains are involved in regulation and the sh domain is the membrane anchoring site. we will present our recent results showing that the unique domain is part of a long loop closed by the interaction of the sh and sh domains (maffei et al., ). the conformational freedom of this disordered region is further restricted through direct contacts between the rt-loop of the sh domain and, primarily, residues located within the recently discovered unique lipid binding region (ulbr). the interaction between the unique and sh domains is allosterically modulated by a poly-proline ligand binding to the canonical binding site of the sh domain (maffei et al., ) . these results demonstrate a direct connection between classical c-src regulation involving the sh domain and the new regulation mechanisms involving the intrinsically disordered regions and provide new evidence of the functional importance and the underlying mechanism behind regulation of signalling pathways by intrinsically disordered domains. in mammalian cells, the golgi reassembly and stacking proteins (grasp and grasp ) are involved in the stacking of golgi apparatus cisternae and in the formation of the golgi ribbon. since grasps have been identified in many organisms, other roles for grasps have already been pointed out, such as chaperoning and transport of other proteins, involvement in cell apoptosis, cell migration, unconventional secretion, and in mitosis. in saccharomyces cerevisiae, it is observed that only % of the golgi cisternae are in stacks and do not form ribbon structures. this build yeast contains a single grasp, called grh , that is analogue to grasp . the structural differences of the golgi apparatus and the functional repertoire of grasps suggest a structural dynamic of these proteins. here, we used a combination of biophysical/biochemical methods to investigate the behavior of grh . bioinformatics and circular dichroism (cd) analyses of grh indicated a high percentage of either flexible regions or extended loops. the partial unfolded grh structure in solution folded into more ordered structures under temperature increasing, dehydration onto a surface and nonaqueous solvents as reported also by cd. hydration of the dehydrated folded protein is a reversible process that is accompanied by unfolding. furthermore, grh showed slow migration in sds-page, high susceptibility to proteases and low cooperativity of the chemical-induced unfolding process. fluorescence of trp residues along with cd data showed grh preserves a considerable amount of residual secondary structure, and the unfolding transition monitored by trp presented higher cooperativity. another cooperative transition was also reported by the extrinsic hydrophobic fluorescence probe ans upon chemical denaturation. these set of experiments indicate that grh behaves as a protein containing intrinsically disordered regions (idrs), characterized by unstructured regions of high polypeptide mobility experiencing many conformations. these findings suggest that an idp-like behavior may be the solution found by nature to account for grh functional need for interactions with several different partners in the cell. conformational changes governing dengue virus capsid protein function and its inhibition by pep andr e f. abstract dengue virus (denv) infection affects millions of people and is becoming a major global disease for which there is no specific treatment available. the interaction of denv capsid (c) protein with host lipid droplets (lds) is essential for viral replication. pep - , a peptide designed based on a denv c intrinsically disordered conserved region, inhibits this crucial interaction. combining bioinformatics and biophysics we determined pep - structure and ability to bind different phospholipids, in the context of denv c function. pep - becomes a-helical upon binding to anionic phospholipids. structure prediction of denv c n-terminal intrinsically disordered region reveals orientations that alternatively shield or expose denv c hydrophobic pocket, supporting a novel autoinhibitory role for this region. these findings pave the way for similar studies to understand disordered proteins and improved peptidomimetics drug development strategies against flaviviruses. topics intrinsically disordered proteins protein-lipid interactions pf- developing mechanistic insight into modulators of tau aggregation eri nakatani-webster , hannah baughman , shaylin higgins , abhinav nath the pathological self-association of microtubule-associated protein tau is implicated in a range of neurodegenerative disorders collectively called tauopathies, perhaps the most prominent of which are alzheimer's disease (ad) and chronic traumatic encephalopathy (cte). tau aggregation in vitro shares many features in common with fibril formation by other amyloid-forming proteins: a nucleationdependent polymerization reaction progressing via oligomeric intermediates into b-sheet-rich fibrillar aggregates, characterized by a distinctive sigmoidal kinetic. over the years, many investigators have advanced our understanding of how these time-courses might best be characterized and interpreted. in particular, elegant analytical and numerical approaches have been developed that supersede the empirical sigmoidal equations typically used to fit fibril formation traces. these modern approaches have enabled more rigorous insight into the mechanism of amyloid formation, and into how small molecules, protein chaperones, and other binding partners can modulate the process. an understanding of a modulator's effects on amyloid formation mechanism is necessary in order for us to predict and engineer its effects on amyloid pathology in a biological context. a given modulator may affect rates of primary or secondary nucleation, elongation, or fibril fragmentation to different extents. each of these perturbations, individually or in combination, can alter the kinetics of aggregation, the final state of the amyloid fibrils, and the sampled ensemble of oligomeric intermediates. unfortunately, fitting of mechanistic models to amyloid formation kinetics is an example of an "ill-posed problem", in that dramatically different combinations of elementary parameters can nevertheless generate very similar sigmoidal kinetic traces. this has typically necessitated global analysis of amyloid kinetic traces collected over a broad range of protein concentrations -a substantial expenditure of time, effort and material that must then be repeated in the presence of a modulator in order to gain insight into its effects. we propose an alternative approach: to fit amyloid formation traces to a large distribution of parameter sets, and determine how various aggregation modulators affect the distribution of parameters. this socalled "parameter distribution analysis" enables the inference of mechanistic effects from measurements at a single protein concentration. parameter distribution analysis based on numerical modeling has been made tractable by advances in computer hardware and software, and can be easily extended to include additional mechanisms or phases relevant to a protein or modulator of interest. here, we illustrate how parameter distribution analysis, complemented by fluorescence correlation spectroscopy (fcs), electron microscopy (em) and other biochemical techniques, can shed light on fundamental aspects of tau amyloidogenesis. we examine the disparate effects that natural products, pharmacotherapies and protein chaperones can have on the mechanism of aggregation, and also discuss the effects of heparin (widely used as an inducer of tau aggregation). these insights demonstrate the value of parameter distribution analysis as applied to amyloid formation and other ill-posed biochemical problems. new insights into amyloidogenesis of tau protein induced by enantiomers of polyglutamic acid amyloidogenesis of tau protein leads to the formation of amyloid fibrils (ordered fibrillar protein aggregates) which are accumulated in neurons of central nervous system during the course of neurodegenerative diseases called tauopathies. studying tau (a typical intrinsically disordered protein) amyloidogenesis has been challenging for many reasons. positive charge on the tau molecule must be compensated (e.g. in the presence of polyanions) in order to initiate the process. heparin (glycosaminoglycan) has been the most intensively studied charge-compensating agent in this context. on the other hand induction of tau aggregation by polyglutamic acid is poorly characterized. mechanisms responsible for the propagation of tau conformations has become an interesting research objective. prion-like features of tau amyloid can be studied in vitro also in the seed-induced regime of aggregation. tau amyloid seeds can act as nuclei for amyloidogenesis. such seeds can be obtained by fragmentation of amyloid fibrils by means of sonication. given that amyloidogenesis can proceed through various assembly pathways resulting in distinct amyloid 'strains' (self-propagating structural variants of amyloid) we have used poly-l-glutamic acid (plga) and poly-d-glutamic acid (pdga) to direct tau onto different amyloidogenic pathways. we have hypothesized that the chirality of the inducers could lead to fibril polymorphism. in our studies, we have used a recombinant human n r tau isoform. we have been using transmission electron microscopy (tem), sedimentation and kinetic measurment. firstly, we have characterized unseeded plga-/pdga-induced tau aggregation to find out that corresponding kinetics were significantly different. secondly, we have used sonicated fibrils to characterize the kinetics of seeded processes. both plga-/pdga-induced amyloid seeds were able to efficiently seed tau aggregation in the presence of plga, whereas in the presence of pdga the aggregation was much less effective. surprisingly, we found that pdgainduced amyloid seeds were able to catalyze fibrillogenesis of tau more clearly in the presence of soluble plga than in the presence of pdga -the primary inducer. we could not induce aggregation of tau in the absence of polyglutamic acids which indicates that positive charge on tau molecules must be unconditionally compensated in order to promote amyloidogenesis. thirdly, using tem we have characterized different morphologies of tau amyloid fibrils generated in unseeded and seeded processes. finally, to further characterize properties of the fibrils we have performed sedimentation experiments. fibrils induced by plga, pdga and heparin revealed different sedimentation properties. heparin-induced fibrils underwent sedimentation more readily than pdga-induced fibrils, whereas plga-induced fibrils remained in the supernatant. these results indicate distinct physicochemical properties of these fibrils. we believe that our findings will contribute to the current understanding of the molecular dynamics of tau amyloidogenesis. self-organizing structures of alpha-synulceins and its aggregates by a coarse-grained monte carlo simulation ras pandey , peter mirau , barry farmer alpha-synuclein (asn) consisting of residues, an intrinsically disordered protein, is linked to such neurodegenerative diseases as parkinson's disease (pd) and alzheimer disease via toxic clumping into abstract amyloid fibrils. we investigate the structure and dynamics of an asn chain as a function of temperature by a coarse-grained approach where a residue is represented by a node. in our coarse-grained approach, a residue is represented by a node. the basic idea is borrowed from the 'united atom' approach in polymer chain modeling that has been used extensively where the benefits and pitfalls of the method is explored for decades. such coarse-grained method has also been used protein chain modeling in recent years (e.g. aip advances , ( )). although the atomic scale structural resolution is sacrificed its specificity is captured via a set of unique knowledge-based residue-residue interactions matrix (e.g. classic miyazawa-jernigan matrix, macromolecules , ( )). a number of local and global physical quantities are analyzed such as contact map, neighborhood and mobility profiles, mean square displacement of protein, its radius of gyration and the structure factor. based on the mobility profile, we are able to identify three distinct segment of asn along its contour, i.e. sluggish nterminal ( - ) and c-terminal ( - , least mobile) separated by the central region ( - ), the nonamyloid component (nac) with higher mobility. contact profile shows that the probability of intrachain residue aggregation (clumping) is higher in the n-terminal region than the c-terminal with least aggregation in the nac region. we find that the radius of gyration (rg) decays monotonically with the temperature, consistent with the finding of allison et al. (jacs, , ( ) ). from the detail analysis of the structure factor we are able to predict the variation of the spatial mass distribution with the temperature as the residues in asn chain organize and disperse by evaluating its effective dimension d. we find the protein conforms to a globular structure (d ) at the low temperatures and to a random coil (d ) at high temperatures which is consistent with the estimates of uversky et al. (j. biol. chem. , ( )). in addition, we provide the estimates of d ( d ) for the intermediate structures as the protein chain makes a transition from globular to random coil. questions under-investigation includes what are the effects of mutations (e.g. b-and g-synuclein), how does the structure of an isolated asn chain change in presence of many interacting protein chains, and how do they organize over the multiple length scales? attempts will be made to address some of these issues as the data become available. tear down the wall: dismantling the biofilm scaffold of e.coli cesyen cedeno , nani van gerven , wim jonckheere , imke van den broek , han remaut , peter tompa csga is the major subunit of the so-called curli fiber system. this is an amyloid structure formed in the outer membrane on e.coli and acts as a scaffold for the biochemical machinery/matrix in the extracellular milieu (biofilms). extracellular matrices of this nature are robust platforms helping bacteria colonization; in this context csga becomes a key target in order to break the architecture within bacterial biofilms. chaperones are molecular machines able to stabilize misfolding prone proteins or even retrieve proteins trapped in non-physiological states. here we show how erd acts as a molecular chaperone inhibiting the formation of csga amyloid fibers in vitro. this work illustrates an alternative approach towards biofilm treatment at a molecular level. coupled folding and binding of transcription factors sarah shammas , alexandra travis , jane clarke intrinsic protein disorder is ubiquitous in transcription, particularly within transcription factors, which frequently fold into structures upon binding to partner molecules (dna or protein). the coupled folding and binding reactions that take place between individual transcription factors and the key hub co-activator proteins are crucial in determining the expression profile of the cell, and hence its phenotype. these interactions have been well studied by structural and equilibrium methods. here we present mechanistic insights into the process, gained through complementary kinetics experiments, for the binding of five separate transcription factors to a single prototypical co-activator (cbp kix). the transcription factors investigated belong to cellular (cmyb, mll, creb, e a) and viral (htlv- blz) classes. these reactions are remarkably fast; after removing the effect of long-range electrostatic rate enhancement the association rate constant is still approximately x m- s- , which is just above the typically quoted upper limit for diffusion-limited reactions between pairs of proteins ( - m- s- ), and is also the highest such value we have found reported. this, combined with the apparent insensitivity of the association rate to residual structure within the unbound state, indicates that binding preceeds folding (induced fit mechanism). interactions between kix and its transcription factors are additionally modulated by allostery between its two binding sites. we investigate the basis for this, finding it to be mediated by changes in protein flexibility. alternative hit finding strategies for intrinsically disordered proteins, exemplified by forkheadbox transcription factors harm jan (arjan) snijder , maria saline , tomas jacso , frank janssen , mattias rohman , tyrrell norris astrazeneca r&d, discovery sciences, se- ,pepparedsleden forkhead box o (foxo) proteins are emerging as key transcription factors in insulin and glucose metabolism, regulation of immune responses, and to balance cell proliferation, apoptosis and senescence. foxo proteins are predicted to be intrinsically disordered proteins (idps); idps are largely unstructured and often function as hubs mediating multiple interactions. idps are considered to be largely evasive from classical small molecule interference and lead-generation approaches, as they lack defined binding pockets. the available methods for addressing these targets have been lagging behind and needs to be developed to assess tractability of this target class. here we have evaluated the tractability of fragment screening on various domains of a forkhead box o member. we could confirm the intrinsically disordered character of foxo and used nmr screening to identify fragments that interact with foxo. one of these fragments was subsequently confirmed as a direct foxo binder in d hsqc-nmr spectroscopy and this fragment showed an effect in a foxo reporter gene assay. these results demonstrate that fragment screening may be a valuable approach for intrinsically disordered proteins although challenges remain to expand these fragments into more potent hits in the absence of detailed structural data. the characterization of amyloid-beta peptide (abeta) oligomer samples is critical to advance in the field of alzheime rs disease (ad). here we report a critical evaluation of two methods used for this purpose, namely sodium dodecyl sulfate polyacrylamide gel electrophoresis (sds-page), extensively used in the field, and electrospray ionization ion mobility coupled to mass spectrometry (esi-im-ms), an emerging technique with great potential for oligomer characterization. to evaluate their performance, we first obtained pure cross-linked abeta and abeta oligomers of specific order. analysis of these samples by sds-page revealed that sds affects the oligomerization state of abeta oligomers, thus providing flawed information on their order and distribution. in contrast, esi-im-ms provided accurate information, while also reported on the chemical modifications and on the structure of the oligomers. our findings have important implications as they challenge scientific paradigms in the ad field built upon the sds-page characterization of abeta oligomer samples. coarse-grained simulation of protein association: application to rate prediction and implication for association mechanisms yinghao wu , the kinetics of protein binding is of paramount importance for understanding cellular functions. for instance, the binding kinetics between membrane receptors and their ligands control the speed of signal transduction after cells are exposed to stimulation. the experimentally measured association rates of protein binding span ten orders of magnitude, a range that was divided into two regimes. it was proposed that a fast association regime is limited by protein diffusion, while the other side of the spectrum is controlled by conformational changes. consequently, all previous simulation methods neglected conformational changes when calculating the association rate of a diffusion-limited regime. however, the most updated theory of protein binding suggests that a protein remains in a pre-existing equilibrium of unbound conformations. binding shifts the equilibrium toward its bound state. this highlights the importance of conformational factors for regulating protein binding. enlightened by this conformational selection model, we hypothesize that the conformational flexibility of protein structures regulates association more widely than previously anticipated. we develop a new coarse-grained model to simulate the process of protein association via the kinetic monte carlo (kmc) algorithm. each residue in this model is represented by its ca atom and a side-chain functional site. a simple physically based potential is used to guide the relative diffusion of two interacting proteins. given the size of the simulation box and the length of the simulation, the association rate constant can be derived by counting the frequency of dimerization among a large number of simulation trajectories. we further designed a prediction strategy that accounts for both the conformational and energetic factors of binding. our method is able to predict rates of protein association that are highly correlated with experimentally measured values. due to the coarse-grained feature, our model was further applied to several special cases of protein association. in one example, we studied the binding kinetics of proteins with flexible linkers. the interaction between thrombin and its functional inhibitor, rhodniin, was used as a testing system. we captured the conformational changes of flexible linkers from the all-atom molecular dynamic simulations. we found that the association with full-length flexible rhodniin was faster than its two individual domains and that their dissociation was more difficult, supporting a "flycasting" mechanism in which partial structures of an intrinsic disordered protein (idp) dock to the target first, while the remaining segments undergo conformational searches and sequentially coalesce around the target. in another example, we studied the binding kinetics of membrane receptors from cellular interfaces. the interaction between membrane proteins cd and cd , cell adhesion molecules known to mediate the activation of t cells and natural killer cells, was used as a testing system. the diffusive properties of these proteins on lipid bilayer were captured from all-atom molecular dynamic simulations. we showed that both d and d association rates could be simulated quantitatively with our method. the calculated values were close to the experimental measurements. we also provided detailed analysis of how molecular diffusions and membrane fluctuations affected d association. pf- (un)structure-function relationships on the ureg enzyme in the nickel-dependent urease system barbara zambelli , francesco musiani , stefano ciurli urease is an essential enzyme for many pathogens and soil microorganisms. its activity relies on the presence of nickel in the active site ( ) . the incorporation of this metal ion into the enzyme requires the formation of a supra-molecular chaperone involving four accessory proteins, named ured, uref, ureg and uree. uree is a metallo-chaperone involved in nickel binding and delivery into the enzyme active site. ureg is a gtpase essential for providing energy to the process of nickel site assembly. uref and ured form a complex that regulates the gtpase activity of ureg. the present work focuses on ureg, which exists in solution as an ensemble of inter-converting conformations ( ) . this observation made this protein the firstly discovered natural enzyme with an intrinsically disordered behavior, possibly allowing it to interact with different protein partners, such as uree ( , ) and uref ( ) and cofactors, such as metal ions ( ), in the urease activation network. ureg folding was studied perturbing protein conformation with temperature and denaturants, and investigating its folding response using circular dichroism, nmr and fluorescence ( ). a combination of light scattering, calorimetry, mass spectrometry, and nmr spectroscopy shed light on the effect of metal ion binding onto the conformational equilibrium of ureg ensemble ( ) . the results suggest that metal binding and solution conditions modulate affect the protein-protein interactions and enzymatic activity of ureg. nuclear inclusion protein a-protease (nia-pro) is a protease involved in processing of pepper vein banding virus (pvbv) encoded polyprotein to generate various intermediates and mature proteins at different stages of the viral life-cycle. nia-pro has two domains-n-terminal viral protein genome linked (vpg) and the c-terminal protease domain (pro).vpg belongs to the group of proteins that are intrinsically disordered, but attain stable structures upon interaction with other globular proteins. such proteinprotein interactions have a regulatory role on the function of the interacting partners. previously, the influence of vpg domain on the activity of pro was studied and it was shown that there was a substantial increase in the protease activity upon interaction with vpg (both in cis and in trans). in the present investigation, several deletion mutants of vpg and nia were constructed with a view to delineate the domain of vpg involved in interaction with pro. it was observed that deletion of residues from nterminus of vpg resulted in a decrease in the activity of pro in cis and in trans probably because of the abrogation of interaction between the two domains. interaction studies using spr (surface plasmon resonance) and elisa confirmed that the n-terminal residues of vpg are important for interaction with pro. the n-terminal residues of vpg are a part of the disordered region of vpg and their deletion resulted in the change in the secondary structure of the vpg and its oligomeric state. the ser and trp residues of pro domain were shown to be important both for the interaction of the two domains and for the activity of protease by mutational analysis earlier. these residues were identified to be a part of wc loop (w -c ) which relay the conformational changes to the active site catalytic triad (his , asp and cys ) leading to activation. however, mutations of these residues did not completely abolish the protease activity as well as the interaction with vpg. therefore, in the present study h and h which are observed to interact with trp and c (via non-covalent interactions) were mutated to alanine and the h a and h a mutants showed a drastic reduction in the activity of protease. molecular dynamics simulations of the wild type pro and the mutants revealed that trp -his -his -cys interaction pathway of the wild type pro was disrupted in the mutants and additional residues were involved in the interaction pathway, such alterations in the network of interactions could be responsible for the loss of activity. however, a change in the oligomeric status of these mutants was also observed as compared to the wild-type pro, suggesting that these residues are important for both the structural and functional integrity of pro and its interaction with vpg. thus, these results provide a molecular insight into the vpg-pro interactions and the modulation of their structure and function upon mutation of residues that are part of the interaction interface. transthyretin (ttr) is one of many proteins that are capable of forming amyloid fibrils in vivo. this protein is associated with two distinct amyloidosis: familial cardiac amyloidosis (fca) that causes a restrictive cardiomyopathy and familial amyloid polyneuropathy (fap) that affect peripheral nerves, they are hereditary and caused by mutations in the ttr gene. the non mutated protein can also aggregate in cardiac tissue in advanced age patients. the diagnosis was established at university hospital since due to a collaborative between our group and the center of amyloidosis antônio rodrigues de mello (ceparm). the only mutation found in brazil was v m in patients diagnosed in france. our group discovered new mutation not described in brazil and a novel mutation not described yet a d. the diagnosed patients are registered in transthyretin amyloidosis outcomes survey (thaos). the novel mutation a d causes a severe restrictive cardiomyopathy that is certainly related to a higher profile of aggregation observed for this mutant if compared to others amyloidogenic mutants of ttr. structural predictions using a bioinformatics tool called foldx showed that the insertion of the mutation cause a electrostatic clash that facilitates the dissociation and aggregation of protein. this mutant was purified heterologously and biophysical studies revealed that this protein is a dimer and not a tetramer as commonly the ttr structure. the crystallographic structure indicates that this mutant is structurally identical to wild type. biophysical studies revealed that this protein is a dimer and not a tetramer as commonly the ttr structure. the thermodynamic stability of a d is lower than the wild type ttr. the aggregation profile showed us that this protein can aggregate in a higher manner and with a fast kinetic to that observed for others amyloidogenic mutants of ttr, forming fibers in two hours of aggregation. heterotetramers of a d and wt are able to aggregate in the same fiber structure. the analysis of interface interaction of this mutant using the pdbsum showed modifications in the profile of hydrogen bonds and non bonded contacts. in addition the oligomers of a d are toxic for primary culture of cardiomyocytes from murine heart. the amyloidogenic profile displayed by this new mutant can be directly correlated with the aggressiveness observed in the disease developed by the identified patient. furthermore the recent consolidation of ttr diagnosis in our university hospital led to the identification of the rare a d variant in a brazilian patient, suggesting that other new, uncharacterized mutants could be identified in the coming years. multiple cellular proteins interact with ledgf/p through a conserved unstructured consensus motif [ ] . the ledgf/p -mll -menin complex was structurally characterized, but only partially [ ] . using nmr spectroscopy, we identified and mapped a novel mll -ledgf/ p interface. colony forming assays in mll -af leukemic cells expressing mll interactiondefective ledgf/p mutants revealed that this additional interface is essential for leukemic transformation. interestingly, the newly defined interface overlaps with the binding site of known ledgf/p interactor, the hiv integrase [ ] . while the pathophysiological interactions of ledgf/p are intensively studied, its physiological role remains unclear. since ledgf/p contributes to hiv integration and leukemic transformation and has become a new therapeutic target for drug development, it is crucial to study its physiological interactions. in addition to hiv in and mll -menin, the ledgf/p integrase binding domain (ibd) also interacts with several other proteins [ , ] . our recent data (manuscript accepted in nat. commun.) revealed structural details of ledgf/p interactions with physiological binding partners. the interaction with the ledgf/p ibd is maintained by an intrinsically disordered ibd-binding motif (ibm) common to all known cellular partners. based on the knowledge of this motif, we identified and validated iws as a novel ledgf/p interaction partner. naturally occurring single mutants, i t, f i, w r and d h of lysozyme in human, have been known to form abnormal protein aggregates (amyloid fibrils) and to accumulate in several organs, including liver, spleen and kidney, resulting in familial systemic amyloidosis. these human pathogenic lysozyme variants are considered to raise subtle conformational changes compared to the wild type. here we examined the effects of the aberrant mutant lysozymes i t, f i,w r and d h, each of which possesses a point mutation in its molecule, on a cultured human cell line, hek , in which the genes were individually integrated and overexpressed. western blot analyses showed lesser amounts of these variant proteins in the medium compared to the wild type, but they were abundant in the cell pellets, indicating that the modified lysozyme proteins were scarcely secreted into the medium but were retained in the cells. immunocytochemistry revealed that these proteins resided in restricted regions which were stained by an endoplasmic reticulum (er) marker. moreover, the overexpression of the mutant lysozymes were accompanied by marked increases in xbp s and grp /bip, which are downstream agents of the ire _ signaling pathway responding to the unfolded protein response (upr) upon er stress.rnai for the mutant lysozymes' expression greatly suppressed the increases of these agents. next, we addressed the interaction between amyloidogenic lysozyme and grp /bip as the former proteins were obtained by immunoprecipitation with the latter protein as well as colocalization of both proteins in the er. lysozyme composes of a-domain rich in helices and b-domain rich in sheet. two helices of a and a in the n-terminal region arrange in parallel and face to face where hydrophobic amino acids at the f, l , l , l , l and l allocate with equal interval there. in the back of dock, there is a core region of amyloid fibril formation, of which the side chain of i is exposed on the protruding. probably, these hydrophobic amino acids might be crucial for lysozyme folding. although mutated lysozymes undergo folding by grp /bip in such environment, the dissociation of the grp from lysozyme by failure of folding is likely inhibited and both proteins remain bound to, resulting in staying to the er. a part of aberrant lysozymes seem to remain bound to grp /bip during folding and insolubilize with aggregation, thus accumulate in the er accompanied with er stress. lysozyme amyloidosis might be caused by long-term accumulation in the endoplasmic reticulum of the abnormal protein. structural characterization of toxic oligomers that are kinetically trapped during alpha-synuclein fibril formation the accumulation of abnormally aggregated proteins within the body is a common feature of several medical disorders, such as alzheimer's disease, parkinson's disease and diabetes mellitus type . while the specific protein found to be the major component of such deposits varies from one disease to another, the formation of the pathological aggregates seems to occur via a common process of misfolding and self-assembly of a normally soluble polypeptide chain into a series of oligomeric intermediates and, ultimately, into insoluble amyloid fibrils that accumulate within specific organs and tissues. increasing evidence indicates that certain oligomeric protein species generated during the self-assembly of specific proteins into ordered fibrillar aggregates can be highly cytotoxic and are likely to be key players in the initiation and spreading of neurodegenerative diseases. however, little detailed structural information is currently available for these oligomeric species due to their often transient nature and, more importantly, because of their variability in terms of size and structure. we report here the isolation and detailed characterization of an ensemble of stable toxic oligomers of alpha-synuclein, the protein whose deposition is the hallmark of parkinson's disease. by defining and minimizing the degree of heterogeneity of these isolated alpha-synuclein oligomers which have accumulated during the process of amyloid formation, we have identified distinct subgroups of oligomers and determined their structural properties and three-dimensional molecular architectures. this characterization has been achieved by the application of a set of complementary biophysical techniques, including a variety of spectroscopic techniques along with analytical ultracentrifugation, atomic force microscopy, and electron microscopy. although these oligomers exist in a range of sizes, with different extents and nature of beta-sheet content and exposed hydrophobicity, all the oligomeric subgroups possess hollow cylindrical architectures with marked similarities to amyloid fibrils. this suggests that these types of oligomers are kinetically trapped during protein self-assembly and that the accumulation of at least some forms of amyloid oligomers is likely to be a consequence of very slow rates of rearrangement of their beta-sheet structures. our findings reveal the inherent multiplicity of pathways of protein misfolding and the key role the beta-sheet geometry acquired in the early stages of the self-assembly process plays in dictating the rates of structural conversions, and thus the kinetic stabilities and pathological nature of different amyloid oligomers. the results of this study provide the basis for a more complete understanding of the nature of the self-assembly of polypeptides into beta-sheet rich amyloid aggregates, and potentially contributes to efforts to identify specific targets for drug discovery. fish otoliths and mammalian otoconia, biominerals composed of calcium carbonate and organic matrix, are involved in the functioning of the inner ear, the sensory organ that plays an important role in hearing and balance [ ] . however, their developmental origins, growth, and the role of the matrix, especially the protein component, are still poorly understood. it has been shown that proteins involved in the formation of biominerals are usually very acidic. they often belong to the group of intrinsically disordered proteins (idps), a class of proteins devoid of a rigid tertiary structure [ , ] . the shape and polymorph selection of calcium carbonate otolith in danio rerio is controlled by the starmaker (stm) protein [ ] . recently, a gene was identified encoding the starmaker-like (stm-l) protein from oryzias latipes, a putative homologue of stm. it has been suggested that stm-l has a similar function as stm, although there is no sequence similarity between stm and stm-l [ ] . several methods, such as size exclusion chromatography, cd spectroscopy and analytical ultracentrifugation demonstrated that stm-l is an coil-like idp, with the tendency to form locally ordered structures [ ] . because stm-l was suggested to play a crucial role in calcium carbonate mineralization, it is possible that calcium ions may influence its conformation, as was previously shown for stm [ ] . however, other ions may also be involved in this process. the aim of this study was to investigate the effect of mono and divalent metal ions on the conformation of stm-l. we used single molecule f€ orster resonance energy transfer (smfret) and fluorescence correlation spectroscopy (fcs), which have shown that calcium ions compacts the proteins most efficiently, followed by magnesium and the monovalent ions. the difference in the effect of monovalent and divalent ions on the protein dimensions is likely to result from the different properties of the ions, like charge density and radius. cd experiments have shown that a high excess of calcium ions caused the formation of ordered secondary structure in stm-l, which may be crucial for the formation of calcium carbonate crystals, when the ratio of building ions to protein is high. it has been demonstrated that dmp is proteolytically processed into fragments, including k n-terminal region and k c-terminal region. as many proteins characterized to be engaged in biomineralization, dmp and its fragments belong to the group of intrinsically disordered proteins (idps). it has been suggested that dmp and its fragments can take a part in otoconia mineralization, as the protein is present in mouse otoconia, but the role of dmp and its fragments in the mineralization of calcium carbonate has not been examined until now. to determine the influence of the dmp fragments for otoconia development, k dmp protein was expressed in bacterial expression system, purified and used in in vitro biomineralization test of calcium carbonate. in particular, immobilized metal anion affinity chromatography (imac) was applied as a first step of purification procedure. because of high content of acidic amino acids, ion exchange chromatography with a mono q column was used as a next step. the development of insects is regulated by the combined action of ecdysteroids and juvenile hormones (jh). pulses of -hydroxyecdysone ( e) initiate each step of metamorphosis, while jh modulates its action and prevents precocious differentiation. the biological and molecular mechanism of e action is well described. in contrary, the way of the jh activity is still poorly understood. in wilson and fabian [ ] reported that drosophila melanogaster mutants lacking met are resistant to toxic doses of jh and its analogue methoprene. it has been proved, that met binds jh at physiological conditions. therefore met is believed to be a putative jh receptor. met may also be involved in a cross-talk between two hormonal signalling pathways, involving e and jh. the detailed structure of met is still unknown. therefore our main aim is to characterize structural properties of met. in silico analysis performed on a full-length met suggested, that n-terminal part of met contains three conserved domains characteristic for bhlh-pas transcription factors, whereas c-terminal part is most probably unstructured. )). capitalizing on self-and cross-amyloid interactions, we designed highly effective, peptide-based inhibitors of amyloid self-assembly of abeta and iapp. due to their favourable properties the designed peptides are promising leads for targeting protein aggregation in ad, t d or both diseases while the inhibitor design strategy should be applicable to other amyloidogenic polypeptides and proteins as well. apoptosis, the process of programmed cell death, must be carefully regulated in multi-cellular organisms to ensure proper tissue homeostasis, embryonic development and immune system activity. the bcl- family of proteins regulates the activation of apoptosis through the mitochondria pathway. dynamic interactions between pro-and anti-apoptotic members of this family keep each other in check until the proper time to commit to apoptosis. the point of no return for this commitment is the permeabilization of the outer-mitochondrial membrane (omm). translocation of the pro apoptotic member, bax, from the cytosol to the mitochondria is the molecular signature of this event. molecular interactions and conformational changes associated with this event have been difficult to obtain due to challenges associated with taking subtle measurements in the complex environment of live cells. to circumvent these challenges, we developed a novel method to reliably detect f€ orster resonance energy transfer (fret) between pairs of fluorophores to identify intra-molecular conformational changes and inter-molecular contacts in bax as this translocation occurs in live cells. in the cytosol, our fret measurements indicated that the c-terminal helix is exposed instead of tucked away in the core of the protein. this coincided with measurements using fluorescence correlation spectroscopy (fcs) that showed that cytosolic bax diffuses much slower than expected, suggesting possible complex formation or transient membrane interaction. we propose that this exposed helix allows for this contact to occur. cross-linking the c-terminal helix (a ) to helix a reduced the instances of these interactions while at the same time yielded fret measurements that are consistent with the a helix tucked into the core of the protein. after translocation, our fret measurements showed that bax molecules form homo-oligomers in the mitochondria through two distinct interfaces involving the bh domain (helix a ) and the c-terminal helix. these findings provide insight into the molecular architecture that may involve possible contacts with other bcl- proteins to permeabilize the omm, which would also be necessary for the regulation of apoptosis. abstract spatial resolution is especially advantageous for bacterial cells because of their small sizes. in the past few years the spatial organization and dynamics of a variety of bacterial cellular structures and protein macromachineries have been revealed with unprecedented details. as the field matures, it is now time to focus on the functional aspect of the observed spatial organizations and dynamics. are they essential in carrying out a specific cellular function? do they play a regulatory role in controlling the on and off of a certain cellular process? in this work i will present a few examples from our laboratory that examine the spatial and functional organization of macromolecules involved in bacterial cell division. transcription factors (tf) exert their function by interacting with other proteins and binding to dna. the nucleus is a compartmentalized space, and the spatial organization of tfs and their partners represents other step of gene expression regulation. we used the glucocorticoid receptor (gr) as a model of tf's mechanism of action. gr is a ligand-activated tf with a relevant role in physiology and a great variety of effects. it can be recruited to specific response elements on dna or interact with other tfs. also, the activity of gr is modulated by different co-regulators, e.g. tif /grip . gr and tif do not distribute homogeneously within the nucleus but accumulate in distinctive clusters. the functional role of this particular intranuclear organization remains unknown. we used advanced fluorescence microscopy techniques to study the dynamics of gr and tif in the nucleus of living cells with high spatial and time resolution. gr and tif fused to fluorescent tags were transiently expressed in newborn hamster kidney (bhk) cells and visualized by a confocal microscope. fluorescence correlation spectroscopy (fcs) experiments were carried on to measure the intranuclear mobility of both proteins. the method is based on the analysis of fluorescence intensity fluctuations due to the movement of fluorescent molecules in and out the confocal volume. the data could be fitted with a model that considers a free diffusion of tif and gr in the nucleus and their binding to fixed targets. we also studied the dynamics of different gr mutants in the presence of different ligands and our results suggest that the binding depends on dna. both gr and tif autocorrelation curves reveal an increase in the bound population upon gr activation by its agonist dexamethasone (dex). a cross-correlation analysis showed that, as expected, dex-stimulus increases the population of gr-tif complexes. without hormone, gr shows a homogeneous distribution and tif forms large clusters in the nucleus. upon dex-binding, gr accumulates in the nucleus, is rapidly recruited to tif foci and there is an important re-distribution of both proteins, that co-localize in the same pattern of small intranuclear clusters. the dynamics of gr and tif molecules at these clusters were studied by performing orbital-scanning measurements, tracking the clusters position in silico and analyzing the intensity fluctuations of the clusters along time. a positive cross-correlation between both channels indicates that dex-bound gr and tif interact at these foci and dissociate from them forming tif -gr hetero-complexes. in conclusion, advanced fluorescence microscopy methods allowed obtaining a dynamical map of gr distribution and function in the nucleus of mammalian living cells. assembly of membrane pores as a mechanism for amyloid cytotoxicity by the bacterial prionoid repa-wh cristina fern andez , rafael n uñez-ramirez , mercedes jimenez , germ an rivas , rafael giraldo amyloid fibril formation is associated with human neurodegenerative diseases. prefibrillar oligomers formed during the fibril assembly process, rather than mature fibrils are known to be central to disease abstract and may be responsible for cell damage. a commonly proposed mechanism for the toxicity of small oligomers is their interaction with the lipid bilayer of cell membranes, leading to loss of membrane integrity [ ] . recent studies from our laboratory have shown that repa-wh , a winged-helix domain from a bacterial plasmid replication protein, can assemble into amyloid fibrils in vitro. when expressed in escherichia coli repa-wh functions as a cytotoxic protein that shares features with the mammalian amyloid proteinopathies. these features have proved repa-wh to be a suitable synthetic model system to study protein amyloidosis [ , , ] . in this work, using the repa-wh bacterial model system, we have studied the interaction between the protein and model membranes (large and giant unilamellar lipid vesicles, luvs, and guvs respectively). repa-wh shows association and aggregation to membranes composed of anionic phospholipids. protein association in guvs did not result in lysis of the vesicles, suggesting the assembly of discrete protein pores as the mechanism for repa-wh membrane damage. to investigate the formation of pores we analyzed by electron microscopy the aggregation of repa-wh in the presence of a pre-formed e. coli lipid monolayer. the em images show the presence of pore-like particles on the monolayer. amyloid pores formation explains the permeabilization effect of repa-wh in vesicle models and is in agreement with observations for human amyloidogenic proteins. the approaches presented here provide a deeper insight into amyloid cytotoxicity towards membranes and will make possible the assay of inhibitors and effectors of amyloidosis under controlled conditions. references: b -adrenergic receptor (b ar) is a member of g protein-coupled receptors, which represent the single largest family of cell surface receptors involved in signal transduction. b ar recognizes a variety of ligands and communicates with cytoplasmic g-proteins by transmitting signals through the cellular membrane. thus, investigation of communication pathways for b ar may give important insights for understanding its allosteric mechanisms and identifying new target sites for more specific and efficient drug molecules to be used in the treatment of pulmonary and cardiovascular disease. in this study, various conformations from ms molecular dynamics (md) simulations and available crystal structures of human b ar were investigated to reveal alternative signaling pathways between its extra and intracellular regions. specifically, shortest communication paths connecting key residues (more than Å apart) at the orthosteric ligand binding site (d , s , t , f , n ) to either l or s located near the g-protein binding site were investigated. the conformers from previous md simulations [ ] include the intracellular loop (icl ), which especially affects the transmembrane collective dynamics but is lacking in x-ray structures. the protein was described as a graph composed of nodes linked by edges. nodes were placed at the alpha-carbon atoms and the edges were calculated based on the number of atom-atom interactions within a cut-off distance . Å for each residue pair. twenty shortest pathways were revealed using k-shortest path algorithm [ ] on the coarse-grained network. our results indicated that distinct signaling paths progressed most frequently on tm but alternative paths were also present, which passed partially through tm , tm , tm or tm depending on the conformation. among the critical residues that transmitted the signal between distant sites, f and n were detected, whose functional roles were reported in previous experimental studies. pathway shifting was observed depending on the open-to-closed transition of icl during md simulations. the sulfonylurea receptor (sur ) is an atp binding cassette (abc) protein that forms the regulatory subunit in katp channels found in the pancreas and the brain. mgatp binding and hydrolysis at the two cytosolic nucleotide binding domains (nbd and nbd ) in sur control gating of the katp channel pore. , proper regulation of katp channel gating by sur is critical. over mutations that lead to diabetes, hyperinsulinism and developmental delay have been identified in different domains of sur , including the nbds. therefore, molecular-level understanding of the structure and function of the nbds is essential for designing improved treatments for sur-related diseases. here we present biophysical and biochemical studies aimed at understanding the effect of disease-causing mutations on the conformation and nucleotide binding of sur nbd . specifically, we are investigating sur nbd mutations that cause neonatal diabetes (r w and h t) or congenital hyperinsulinism (c d, g v, r g, r d and k t). our nuclear magnetic resonance (nmr) data shows that the hyperinsulinism mutation k t causes chemical shift changes throughout the spectrum of nbd , implying overall changes in protein conformation that may affect mgatp binding and inter-domain interactions with other domains in the sur protein. size-exclusion data show that the other hyperinsulinism mutations (c d, g v, r g, r d) produce mostly aggregated protein, likely as a result of misfolding of nbd . misfolding of nbd may be the underlying cause of reduced katp trafficking seen with these mutations and hence decreased katp channel gating observed in hyperinsulinism. in contrast to the k t mutations, the congenital diabetes-causing mutations (r w and h t) cause few nbd nmr spectral changes. however, the congenital diabetes mutation r w decreases the affinity of nbd for mgatp, which is unexpected for congenital diabetes mutations. our fluorescence, circular dichroism and microscale thermophoresis data corroborate the results that we have obtained by nmr spectroscopy. our data provide molecular-level details on the effects of disease causing mutations in human sur . egfr increased stability: rmsf of the ca atoms during the md simulations suggest that glycosylation is associated with dampened motions, suggesting that the glycans stabilize the structure. subdomain iii is the most stabilized while subdomain i is stabilized largely in the proximity of the ligand. both dimer interfaces including the dimerization arm from domain ii and the tip of domain iv fluctuate less upon glycosylation. hydrogen bonding; persistent interactions seen for protein-glycan: in the disaccharide-containing system, we observed three highly occupied hydrogen bonds between the glycans and domain iii and iv of egfr. hydrogen bonds of domain iii involve the residue asp in which a sidechain oxygen interacts with oxygen atoms of the n-acetylglucoseamine linked to asn . in domain iv a hydrogen bond is seen between the cys backbone amide and the oxygen atom of n-acetylglucosamine linked to asn . in the oligosaccharide-containing system hydrogen bonds observed between the glycan attached to asn and domain ii. these hydrogen bonds form between the gln sidechain oxygen atom and cys backbone oxygen atom and the mannose linked to asn . the reduction in the mobility of these amino acids suggests that hydrogen bonds impart stability to both the sugars and to the interacting egfr. insects possess a complement-like immune response utilizing thioester-containing proteins, or teps. the only arthropod tep of known structure is anopheles gambiae tep , which is a key component in the natural immunity of this mosquito to malaria parasites (genus plasmodium). unlike vertebrate complement factors, agtep does not contain an anaphylatoxin domain which acts to regulate a massive conformational change accompanying activation of the protein. the mechanism of agtep must therefore involve an alternative mechanism for allosteric regulation of thioester activation. in place of a small internal domain, a large, heterodimeric complex of two leucine-rich repeat (lrr) proteins, lrim and apl c, have been shown to specifically bind and stabilize the active conformation of agtep . i will present my group's most recent work in this area. we have shown that different alleles of tep , which are known to influence the vectoral capacity of wild mosquitoes, differ significantly in their susceptibility to thioester hydrolysis. allelic variation is centered on residues at the protein-protein interface within tep containing the thioester bond. the lrim /apl c heterodimer is shown to form an extended and flexible ensemble in solution. two closely-related genes to apl c, apl a and apl b, can also form a complex with lrim , and apl b lrr domain can form a homodimer. we propose that a flexible and heterogeneous group ensemble of lrim /apl dimers interact with the active conformation of tep , thereby producing an array of immune complexes to protect mosquitoes from a diverse set of pathogens. human flap endonuclease- (hfen ) is an essential metallo-nuclease involved in okazaki fragment maturation and long-patch base excision repair. during these processes, bifurcated nucleic acid intermediates with ssdna '-flaps are generated by polymerase strand displacement synthesis and then cleaved one nucleotide into the downstream duplex by fen to create a nicked-dna that is a suitable substrate for ligase. until recently, how hfen achieves tremendous catalytic power (rate enhancements > exp ) and exquisite selectivity for the scissile phosphate had been understood poorly ( ) . in , the grasby and tainer labs solved the structures of hfen in complex with product and substrate. this study revealed that scissile phosphate selectivity is largely due to the substrate dna undergoing a novel di-nucleotide unpairing (dnu), which places the scissile phosphate diester in contact with the requisite divalent metal ions. in addition, by comparing the structures of hfen alone ( ) and in complex with substrate and product dnas ( ), grasby and tainer proposed a model, whereby protein conformational changes occur upon binding substrate resulting in placement of key basic residues that position and/or electrophillically catalyse hydrolysis of the scissile phosphate diester. further work using a cd-based assay showed that metals are absolutely required for dnu, whereas the key basic residues in the active site are not. surprisingly, perturbations to the protein structure that are much more distant from the fen active site (i.e., helical cap) prevent dna unpairing, implying that the fen protein actively participates in the unpairing process ( , ); however, how it does remains a mystery. the maximal multiple turnover rate of hfen reaction is rate-limited by enzyme product release, whereas hfen kinetics under substrate-limiting conditions ([e]<[s] torr), whereas the apparent o -affinities of these metalloporphyrins, which are incorporated in apo-myoglobin, apo-hb, serum albumin, etc., increase substantially to p < - torr, though their coordination structures are apparently unchanged [ ] . such substantial increases in the apparent ligand-affinities of metalloporphyrin-containing proteins are accomplished by preventing/inteferring with the dissociation of the ligand by protein matrix, since the interior of globin is nearly fully packed by protein matrix. in hb, the dissociation process of the ligand proceeds through the "caged" state [ ] [ ] [ ] , which can be produced by cryogenic photolysis of the ligated-states at . k and in which the metal-ligand bond is broken and the un-bonded ligand is trapped near the bonding site within the globin moietiy. this "caged" state has spectral features distinct from those of either deoxyor ligated states of the respective hemoproteins. the apparent ligand-affinities of hb are regulated by heterotropic effectors without detectable changes in either static quaternary/tertiary structures of the globin moiety or the coordination/electronic structures of the metalloporphyrin moiety and thus the ligand-affinity of the metalloporphyrins themselves [ ] [ ] [ ] . the reduction of the apparent ligand-affinities of hb may be caused by increases in the migration rate of ligands through globin matrix from the "caged" state to solvent, resulting from the effector-linked, enhanced high-frequency thermal fluctuations which increase the transparency of the globin matrix toward small diatomic ligands [ ] [ ] [ ] . conclusion: the ligand-affinity of hb is regulated through protein dynamics by heterotropic effectors, rather than static quaternary/tertiary structural changes. thus, the "caged" state of hb acts as a critical transition state in regulation of the affinity for small diatomic ligands in hb [ ] . the role of metal ions in the regulation of life processes is extremely important. they act as signal transducers, protein configuration stabilizers, enzymatic cofactors, oxygen transport supporters and many others. for example, subtle perturbations in calcium homeostasis may lead to mental disabilities and are linked to diseases such as autism spectrum disorders (asd). in this study we focus on complex protein systems, mainly those present in the brain. we search for dimers mediated by the presence of metal ions, and determine the impact of the presence or absence of the latter on the structure and energetic properties of the complex in the protein-protein interface. we investigate ions' influence on the interface stability using classic molecular dynamics methods (md), including steered md. moreover, we apply a novel suite of enhanced md-based methods recently developed by our team (rydzewski & nowak) to explore ion diffusion pathways in protein fragments of the synapses. finally, we describe specific inter-protein ion binding motifs with the most important interactions, collating them with various structures deposited in the protein data bank [ ] . the binding of integrins to collagen plays a critical role in numerous cellular adhesion processes including platelet activation and aggregation, a key process in clot formation. collagen is an unusually shaped ligand, and its mechanism of recognition and role in selectivity and affinity are unique, and at this stage not well understood. the i-domain of the integrin protein binds to collagen specifically at multiple sites with variable affinities, however the molecular mechanism of integrin i-domain (ai) regulation remains unknown. using nmr, along with isothermal titration calorimetry, mutagenesis, and binding assays we are developing a novel integrated picture of the full recognition process of the integrin a i binding to collagen. the adhesion of the a b integrin receptors to collagen is cation-dependent with collagen binding a mg(ii) ion that is located at the top of the extracellular integrin a i-domain (a i). our results show evidence for a regulatory effect of the mg(ii) ion on a i affinity, by inducing allosteric ms-ms motions of residues distant from the binding site. we propose a novel model of a i recognition to collagen, comprising a two-step mechanism: a conformational selection step, induced by mg(ii) coordination, and an induced-fit step caused by collagen binding. hydrogen-deuterium exchange experiments show that the induced-fit step is facilitated by the reduced local stability of the c-terminus. we propose that the conformational selection step is the key factor that allows discrimination between high and low affinity collagen sequences. cytochromes p (cyp) are heme containing enzymes involved in the metabolism of endobiotics and xenobiotics, such as drugs or pollutants. [ ] in humans, cyps are attached to the biological membranes of endoplasmic reticulum or mitochondria by n-terminal transmembrane anchor and they are partially immersed by their catalytic domain to different level. [ ] generally, the composition of lipid membrane may significantly affect behavior of protein embedded in respective membrane e.g. the cholesterol in membrane alters membrane properties such as: thickening of the membrane, changing the stiffness or enhancing ordering of the membrane. furthermore, the increasing amount of cholesterol in membrane may also alter interaction with membrane proteins and affect solute partitioning between membrane and water molecules. [ ] cholesterol is also known to noncompetitively inhibit the most typical drugmetabolizing cyp -cyp a , [ ] however the mechanism was unknown. for this reason, we prepared the set of simulations of cyp a embedded in dopc lipid bilayers with various cholesterol concentrations ( , , , and % wt; figure ) and the ns long md simulations were carried out. md simulations showed the formation of funnel-like shape of the lipids close to the catalytic domain of cyp. in addition, the cholesterol molecules have tendency to accumulate in the vicinity of membrane-attached f/g loop. the catalytic domain sunk deeper into the membrane with cholesterol and also the number of amino acids in contact with membrane was bigger than in the pure dopc bilayer. in contrast, the presence of higher amount of cholesterol affected the pattern of channel opening effectively blocking the access to the active site from the membrane, which in turn may affect the substrate preferences and catalytic efficiency. [ ] finally, we study the effect of different lipid types on membrane-attached cyp a . anti-( -hydroxy- -nitrophenyl)acetyl (np) antibodies are one of the most widely analyzed type of antibodies, especially with respect to affinity maturation [ ] [ ] [ ] . affinity maturation is a process in which b cells produce antibodies with increased affinity for the antigen during the course of an immune response, and is like "evolution" in term of increasing antigen-binding affinity. during the course of affinity maturation, the structural dynamics of antibodies, which are closely correlated with the binding function, can change. to analyze the structural dynamics at atomic resolution and the single-molecule level, we tried to express and purify single-chain fv (scfv) antibodies against np. using scfv antibodies, we can also analyze the effects of key residues on affinity maturation via site-directed mutagenesis. as the first step, we have succeeded in generating a sufficient quantity and good quality of scfv of affinity-mature anti-np antibody, c , with a linker composed of four repeats of gggs. the scfv protein was expressed in the insoluble fraction of e. coli, and solubilized using m urea, followed by refolding by step-wise dialysis to decrease the urea concentration. the final step of purification using an antigen column indicated that approximately % of the solubilized protein was correctly refolded and possessed antigen-binding ability. the analytical ultracentrifugation (auc) analysis showed that the purified c scfv exists in the monomeric state with little oligomeric contamination. the secondary structure and thermal stability of c scfv were analyzed using circular dichroism (cd). the far-uv cd spectra of c scfv indicated typical b-sheet-rich structures. upon antigen binding, the far-uv cd spectrum remained unchanged, but the thermal stability increased by approximately oc. the antigen-binding function of c scfv was analyzed using a surface plasmon resonance (spr) biosensor, biacore. the binding affinity and kinetics of c scfv for np conjugated to bovine serum albumin immobilized on the sensor chip were similar to those of intact c . taken together, the results of auc, cd, and spr indicated that c scfv could be refolded successfully and would possess its functional structure. next, to analyze the structural dynamics of c scfv in the absence or presence of antigen, experiments involving diffracted x-ray tracking (dxt) were performed [ ] . c scfv with an n-terminal his-tag was immobilized on substrate surfaces using tag chemistry, and au-nanocrystals were labeled on the surface of scfv as tracers. the motions of c scfv were analyzed in two rotational directions representing tilting (u) and twisting (v) mean square displacement (msd) analysis from more than trajectories showed that the slope for c scfv without antigen, especially in the u direction, was greater than that for c scfv with antigen, suggesting that the motion of scfv was suppressed on antigen binding. the antibiotic resistance enzyme aph( '')-ia confers antimicrobial resistance to aminoglycoside antibiotics in staphylococci and enterococci. this kinase phosphorylates aminoglycosides such as gentamicin and kanamycin, chemically inactivating the compounds. we have determined multiple structures of the enzyme in complex with nucleoside and aminoglycoside substrates and cofactor magnesium. introduction of aminoglycoside to crystals of aph( '')-ia induce gross conformational changes in crystallo, illustrating several important stages of the catalytic cycle of the enzyme. an interaction between nucleoside triphosphate and an amino acid residue on a conserved loop has also been identified that appears to govern a conformational selectivity and modulates the enzyme activity when no substrate is present. comparisons between multiple protein molecules both within and between crystal structures allow us to infer functional states of the enzyme as it carries out catalysis. these structures collectively highlight an enzymatic flexibility that not only allows the binding of diverse aminoglycosides, but also appears to transition from a stabilized, inactive enzymatic state to a catalytically active enzyme with an active site geometry identical to distantly-related eukaryotic protein kinases. mechanistic insight gained from these studies begin to demystify a widespread staphylococcal resistance factor, and provide a starting point for the development of anti-infectives toward this important antimicrobial resistance machine. ryan godwin , william gmeiner , freddie salsbury wake forest university -department of physics, wake forest university health sciences -department of cancer biology the zinc-finger of the nf-jb essential modulator (nemo) is a ubiquitin binding domain, and an important regulator of various physiological processes including immune/inflammatory responses, apoptosis, and oncogenesis. the nominally functioning residue monomer ( jvx) is represented by a bba motif, with a cchc active site coordinating the zinc ion. here, we investigate the effects of a single point mutation that has been linked to the disease states associated with ectodermal dysplasia. the single mutation of the last binding cysteine (residue ) to a phenylalanine ( jvy) distorts the available conformation and dynamics of the protein, as shown via microsecond, gpuaccelerated molecular dynamics simulations. we examine these two proteins in various states of zinc-binding and coordinating cysteine protonation. in addition to destabilization of the alphahelix induced by the cysteine to phenylalanine mutation, prominent conformations show the bsheets turned perpendicular to the alpha-helix, providing a possible mechanism for the induced disease state. , catalytic ( - aa) and c-terminal ( - aa)) were expressed in e. coli. several truncated in variants containing amino acids - , - , - and - were also prepared. a full-size ku with a gst-tag on its n-terminus was purified from e. coli. all the experiments performed showed that neither n-terminal nor c-terminal domains of hiv- in are essential for its binding with ku despite a weak binding capacity retaining to the c-terminal domain. the catalytic core ( - aa) as well as the mutant lacking c-terminal domain ( - ) both demonstrated affinity to ku comparable to the affinity of the full-size in, whereas its truncated variant ( - aa) bound to ku protein only weakly. we also expressed a c-terminal ha-tagged full-length in and its - variant in hek t cells together with a wt ku - flag and showed that both in variants are stabilized by co-expression with ku by approx. twofold. we hypothesize that the binding surface within in lies in the region from to a.a. that is a long a-helix. we have shown that a homologous integrase from prototype foamy virus that lacks this structural element does not bind to ku . it is worth noting that ku does not affect the interaction of in with its major cellular partner -ledgf/p as well as its interaction with the dna substrate. this work was supported by an rfbr grant - - and by an rscf grant - - . the nadph-dependent cytochrome p oxidoreductase (cypor) is large amino-acid long microsomal multidomain enzyme responsible for electron donation to its redox partner cytochrome p (cyp) involved in drug metabolism. electron transfer (et) chain is mediated by two riboflavin-based cofactors -flavin mononucleotide (fmn) and flavin adenine dinucleotide (fad) within their respective domains and nicotinamide adenine dinucleotide phosphate (nadph). during this electron transfer cypor undergoes several structural changes in open and closed state of both domains in different degree of contact. in spite of the fact that cyp-cypor complexes play a key role in drug metabolism, the atomistic mechanism of structural rearrangements during complex electron transfers is still lacking. here, we present the results of our study on structural changes during cypor multidomain complex movement between individual electron transfers using classical molecular dynamics (md) and metadynamics (mtd) simulations with cofactors of nadph, fad and fmn in resting state. homology model of human cypor in both forms (opened and closed) were embedded into pure dioleoylphosphatidylcholine (dopc) bilayer. after system equilibration (figure ), structural changes of protein, anchor and cofactor movement were studied. we were able to select possible cypor-membrane orientation which would allow interaction with cytochrome p . in addition, spontaneous closing of open cypor was observed. however structural changes between crystal structures and structures obtain from md simulations lead us to the use of metadynamics in order to speed up the process. fmn and fad cofactor remained in close van der waals contact during the -ns long simulation stabilized by p stack interaction of fad with trp , whereas continual movement of nadph continually weakens its p stack interaction with fad. after ns of classical md additional metadynamics simulations were performed in order to investigate internal motion of cofactors during electron transfer. atoms c n (nadph) and n (fad) which are responsible for et were able to move closer to the distance of Å after adding biasing potential. this distance is more than sufficient for electron transfer to occur. after switching back to classical md cofactors got into resting positions ( Å) again. our results show that cypor undergo several structural changes and internal motions of cofactors in order to transfer electrons to its redox partner -cyp. research & utilization div., jasri/spring- , grad. school frontier sci., univ. tokyo, grad. sch. sci., univ. hyogo, japan, national institute of advanced industrial science and technology, japan, pentameric ligand-gated ion channels (plgics) are a major family of membrane receptors that open to allow ions to pass through the membrane upon binding of specific ligands. plgics are made up of five identical (homopentamers) or homologous (heteropentamers) subunits surrounding a central pore. structural information about their multiple allosteric states, carrying either an open or a closed channel, has become available by recent studies by x-ray crystallography. however, dynamic information are needed to understand their mechanism of gating, notably the long-range allosteric coupling between the agonist binding site and the ion channel gate. here we used the diffracted x-ray tracking (dxt) method ( ) to detect the motion of the extracellular and transmembrane domain two plgics: the nicotinic acetylcholine receptor (nachr) and a proton-gated bacterial ion channel from gloeobacter called glic. dxt is a powerful technique in biological science for detecting atomic-scale dynamic motion of allosteric proteins at the single molecular level and at tens of micro seconds timescale resolution. the dynamics of a single protein can be monitored through trajectory of a laue spot from a nanocrystal which is attached to the target protein immobilized on the substrate surface ( , ). dxt detects two kinds of rotational motions of nanocrystal, tilting and twisting, based on x-ray incident beam axis. dxt analysis with . ms/f time resolution showed that tilting motion of the transmembrane domain of glic and both tilting and twisting motions of the extracellular domain of glic and nachr were enhanced upon application of agonists (lowering the ph for glic, and binding of acetylcholine for nachr). the detailed dynamic information, including size effect of gold nanocrystal to the motion of them, is discussed. [ proteins possess unique structure-encoded dynamics that underlie their biological functions. here, we provide experimental evidence for an evolutionary mechanism driven solely by long-range dynamic motions without significant backbone adjustments, catalytic group rearrangements, or changes in subunit assembly. crystallographic structures were determined for several ancestral gfp-like proteins that were reconstructed based on posterior sequence predictions, using members of the stony coral suborder faviina as a model system. the ancestral proteins belong to the kaede-type class of gfps, a group of proteins that undergoes irreversible green-to-red photoconversion and is therefore frequently employed in superresolution microscopy. surprisingly, we find that the structures of reconstructed common green ancestors and evolved green-to-red photoconvertible proteins are very similar. therefore, we analyzed their chain flexibility using molecular dynamics and perturbation response scanning. we find that the minimal number of residue replacements both necessary and sufficient to support lightinduced color conversion provide for increased fold stiffness at a region remote from the active site. at the same time, the allosterically coupled mutational sites appear to increase active site conformational mobility via epistasis. these data suggest that during evolution, the locations of fold-anchoring and breathing regions have been reversed by allosteric means. therefore, we conclude that the green-tored photoconvertible phenotype has arisen from a common green ancestor by migration of a knob-like anchoring region away from the active site diagonally across the beta-barrel fold. based on titration experiments, we estimate that at ph , . % of the protein population harbors neutral side chains for his and glu , residues that form an internal salt bridge near the chromophore. we propose that this reverse-protonated subpopulation constitutes the catalytically competent state. in the electronically excited state, light-induced chromophore twisting may be enhanced, activating internal acid-base chemistry that facilitates backbone cleavage to enlarge the chromophore. in this way, a softer active site appears to be coupled to a mechanism involving concerted carbon acid deprotonation and betaelimination. dynamics-driven hinge migration may represent a more general platform for the evolution of novel enzyme activities by tuning motions in the active site. the binding of an agonist to a gpcr causes a conformational change in the receptor that leads to its activated functional state. rhodopsin, the membrane receptor responsible for photoreception in the vertebrate retina, is a prototypical gpcr and has been extensively used in structural, biochemical and biophysical studies of this class of receptors. different small molecules have been described to be capable of binding to rhodopsin. in addition, mutations in rhodopsin have been associated with retinal diseases and efforts have been carried out in order to find potential ligands that can offset the effect of these mutations. cyanidins, a group of flavonoids within the larger family of polyphenols, have been reported to stimulate chromophore regeneration of rhodopsin by means of the formation of regeneration intermediates. the aim of the current study was to evaluate the effect of the flavonoid quercetin on the conformational properties of both native bovine rhodopsin and heterologously expressed recombinant rhodopsin. rhodopsin was purified from bovine retinas by immunoaffinity chromatography, and photobleaching, thermal stability, metarhodopsin ii decay and chromophore regeneration assays were carried out in the absence or in the presence of mm quercetin. for recombinant rhodopsin, a plasmid encoding wild-type opsin was transfected into mammalian cos- cells, in the absence or in the presence of mm quercetin, harvested, regenerated with -cis-retinal, or -cis-retinal, and subsequently purified in dodecyl maltoside solution. no differences in photobleaching behavior, upon illumination, could be detected in the purified quercetin-containing samples compared to those in the absence of this flavonoid. in the case of rhodopsin, and the recombinant wild-type protein regenerated with -cis-retinal, quercetin did not significantly alter the thermal stability and rate of regeneration of the purified proteins under our experimental conditions. however, a two-fold increase in the thermal stability and a % increase in chromophore regeneration were observed for the recombinant wild-type protein regenerated with -cis-retinal in the presence of quercetin. in contrast, the presence of quercetin did not alter the electrophoretic and basic spectroscopic properties of rhodopsin, or those of the recombinant wild-type protein, suggesting no important structural alterations as a result of quercetin binding to the receptor. the positive effect of quercetin on the stability, and chromophore regeneration of rhodopsin, could be potentially used to counteract the effect of naturally-occurring misfolding mutations in rhodopsin. thus, quercetin could help stabilizing rhodopsin mutants associated with retinal diseases such as retinitis pigmentosa. furthermore, docking of the ligand, carried out on the crystallographic structure of rhodopsin (entry gzm), reveals several favorable sites for quercetin binding. one of this would be compatible with -cis-retinal suggesting a complementary binding to the receptor of this isomer which would not be compatible with -cis-retinal binding. identification of prospective allosteric sites of p by computational methods protein function is intrinsically associated with structural flexibility, so that understanding the functional properties of proteins requires going beyond the static picture produced by x-ray diffraction studies. structural flexibility can also be interpreted as a dynamic exchange between different conformational states with low energy barriers at room temperature. allosterism is a mechanism to regulate protein function associated with the plasticity exhibited by proteins. allosteric sites can be considered transient cavities that can be occupied by a small molecule with the subsequent modulation of the protein plasticity. occupation of these sites may modify the affinity of the protein for its native substrate that can be positive when the affinity increases or negative when the affinity decreases. allosterism can be used for the design of non-competitive ligands as new therapeutic agents. this mechanism of activity modulation is particularly interesting for those targets that use a common substrate for activation, like in the case of kinases to search for selective compounds. proteins can be viewed in solution as an ensemble of diverse energy accessible conformations. binding of an allosteric ligand produces a redistribution of the population of the diverse conformational states, which at the end modulate the affinity of the native substrate. allosteric sites can be characterized using computational methods by ensemble docking. it consist of characterize a set of structures that represent the accessible conformations of a protein that can then be used to perform virtual screening. in the present work we have studied prospective allosteric sites of p using computational methods. the protein is a member of the mitogen-activated protein kinases (mapks), a highly regulated group of enzymes that control a variety of physiological processes, including mitosis, gene expression, apoptosis and metabolism movement among others. the conformational profile of p was assessed using a us trajectory of accelerated molecular dynamics as sampling technique in explicit solvent. we used as starting structure the apoform of p in its inactive conformation (entry p ). the conformational features of the protein were assessed through the analysis of the variance of the most flexible regions of the protein using principal component analysis. the snapshots of the trajectory were projected onto the two principal components. subsequent cluster analysis permitted us to select a few structures for further studies. specifically, prospective biding sites were identified using a hydrophobic probe as implemented in the sitemap program. the results show previously described regulatory sites and some new prospective ones. hydrogen/deuterium exchange-mass spectrometry provides clues on the mechanism of action of min e maria t. villar , kyung-tae park , joe lutkenhaus , antonio artigues cell division in most bacteria is initiated by the formation of the z ring, an essential cytoskeletal element that serves as a scaffold for the cytokinesis machinery, at the mid body of the cell. in e coli the spatial location of the z ring is regulated by the min protein system, comprised by three major proteins: minc, mind and mine. the dynamic interaction between these proteins results in the formation of an oscillating protein gradient between the poles of the cell. this oscillation determines the position of the formation of the z ring. many aspects of this simple mechanism are beginning to be understood. in particular, the conformational changes associated with the interaction of the three min proteins between them and with the cell membrane, are of especial interest. hydrogen/deuterium exchange mass spectrometry (hdx ms) is a sensitive technique for the detection of changes in protein conformation and dynamics. the main advantages of this methodology are the ability to study native proteins in solution, the requirement for low protein concentrations, the potential to discriminate multiple coexisting conformations, and the lack of an upper limit to the size of protein to be analyzed. here we use hdx ms to analyze the dynamics of the wild type mine and of its inactive double mutant d a d a. our results show significant differences in the rates of exchange and in the total amount of deuterium exchanged at the end of the reaction between these two forms of mine. the wild type protein exchanges most of the amide hydrogen during the first few seconds of initiation of the exchange reaction. on the other hand, the mutant protein exchanges only % of the total amide hydrogen atoms during the first seconds of initiation of the exchange, and the remaining % amide hydrogen atoms are exchanged more slowly during the next few minutes of the reaction. our data are consistent with the existence of a highly flexible structure for the wild type protein and the coexistence of at least two rigid conformations for the double mutant that are undergoing a cooperative transition. interestingly, the central b-sheet forming the interface between the two subunits is protected against exchange on both proteins. these results provide insights into the conformational changes that mine undergoes during its interaction with mind. biased signalling and heteromization of the dopamine d receptor in schizophrenia and parkinson's disease pablo herrera nieto , james dalton _ , jes us giraldo _ universidad aut onoma de barcelona biased signalling and heteromization of the dopamine d receptor in schizophrenia and parkinson's disease as a significant component of dopamine signalling in the brain, the dopamine d receptor (d r), a member of the class a gpcr family, is an important target in the treatment of neurological conditions such as schizophrenia and parkinson's disease. d r shows a variety of signalling pathways through g proteins, including adenylyl cyclase inhibition, gbgpotentiation of adenylyl cyclase , and erk kinase activation, in addition to b-arrestin recruitment,. these pathways are differentially activated by some agonists and it has been suggested that d r ligands with gai/o antagonist and b-arrestin agonist activity may have anti-psychotic behavioural activity with reduced extra-pyramidal side effects. d r has also been found to form homodimers or higher-order hetero-oligomers with other gpcrs, which may modulate d r conformation and activity, thus constituting an additional form of allosteric receptor regulation. based on these findings, we have computationally modelled the full-length structure of d r, including its long intracellular loop (icl ) that is residues in length and absent in all homologous gpcr crystal structures. using state-of-the-art tools, such as rosetta for ab initio protein folding and acemd for micro-second molecular dynamics (md) simulations we have successfully de novo folded icl , which primarily consists of extensions to transmembrane helices (tmh) and and an intervening disordered histidine/proline-rich region, which is highly flexible. the latter is observed to interact with other receptor intracellular loops (icl and icl ) and appears to restrict access to the g-protein binding-site. in addition, we have docked a structurally diverse collection of ligands (biased agonists, antagonists and allosteric modulators) into our d r model and observed characteristic binding patterns suggestive of different biased signalling mechanisms. finally, through protein-protein docking with rosettadock, we have generated a complete heterodimer model of d r with the adenosine a a receptor (aa ar), where a mutual interface is formed between their respective tmhs and , as well as an association between the c-terminus of aa ar and icl of d r. this may be a particularly relevant biological complex in the treatment of parkinson's disease where antagonists of aa ar have been shown to ameliorate disease effects, potentially through direct interaction with d r. bis-ans as a tool to monitor conformational changes upon assembly of binary and ternary complexes of eif e, e-bp inhibitory protein, and the mrna 'cap specific recognition of the mrna ' terminal cap structure by the eukaryotic initiation factor eif e is the first and rate-limiting step in the cap-dependent translation. small e-binding proteins, e-bp , e-bp , and e-bp , inhibit the translation initiation by competing with eif g initiation factor for the same binding site, and by blocking the assembly of the translation machinery [ ] . our recent studies revealed intricate cooperativity between the cap and e-bp binding sites of eif e [ ] . here, we applied a fluorescent dye, , '-dianilino- , '-binaphthyl- , '-disulfonate (bis-ans) to investigate conformational changes upon assembly of binary and ternary complexes composed of human eif e, e-bp , and the mrna 'cap analogue, m gtp. the fluorescence quantum yield of bis-ans increases significantly upon binding to hydrophobic sites of proteins, making the probe a convenient tool to determine the accessibility to hydrophobic surfaces, and to monitor structural reorganisation of macromolecules [ ] . we characterised the interaction of bis-ans with eif e and e-bp by fluorescence titration. the association processes takes up to several hours until the saturation of the fluorescence signal is achieved, reflecting high flexibility of the protein structures. the association constants kas of eif e/bis-ans complexes are very high for the non-specific interaction. the kas values for eif e/bis-ans and eif e/ e-bp /bis-ans are similar ( m ), whereas the presence of m gtp results in ca. -fold weaker binding of the probe to eif e. the affinity of bis-ans for e-bp is -fold lower than that for eif e. we found no effect of either m gtp or e-bp on the fluorescence of bis-ans in complex with eif e, thus indicating lack of conformational changes around the probe on eif e/m gtp or eif e/ e-bp complex formation. it also testifies that bis-ans does not bind to the cap-binding site, despite the hydrophobic nature of this eif e region. on the contrary, addition of m gtp to the eif e/ e-bp /bis-ans complex causes an increase of the probe fluorescence, which indicates differences in the structural reorganisation in the binary, m gtp/eif e, compared with the ternary, m gtp/eif e/ e-bp , complexes, and confirms the spatial cooperation between the cap and e-bp binding sites. we also observed an increase of fluorescence for bis-ans bound to e-bp in the presence of eif e, pointing out that e-bp partially folds upon association with eif e. in summary, our results provide a deeper insight into the structural aspects of the molecular interaction at early stages of the cap-dependent translation. acknowledgements: this work was supported by the bst /bf project from university of warsaw background: beta -glycoprotein (b gpi) is a protein abundantly present in human plasma and highly conserved in all mammals. b gpi has been identified as the major antigen in the antiphospholipid syndrome (aps), a severe thrombotic autoimmune disease. despite its importance in the pathogenesis of aps, the physiological role of b gpi is still elusive. in a previous work we have demonstrated that b gpi significantly prolongs the clotting time in fibrin generation assays, and inhibits aggregation of gel-filtered platelets (ic . um), either isolated or in whole blood, by inhibiting cleavage of par on intact platelets (ic . um) and in solution. importantly, b gpi does not alter the ability of thrombin (fiia) to generate the anticoagulant protein c, with or without thrombomodulin added. hence, we concluded that b gpi inhibits the key procoagulant properties of fiia, without affecting its unique anticoagulant function. we also proposed that b gpi, together with other more efficient anticoagulant pathways such as thrombomodulin-fiia -protein c and antithrombin iii-fiia, may function as a mild anticoagulant in vivo especially in those compartments were the efficacy of thrombomodulin is limited, as in the large vessels, or is even absent, as in the brain vasculature. aims: lacking the threedimensional structure of b gpi-thrombin complex, the aim of this work is to identify the peptide regions either on thrombin and b gpi involved in complex formation. results: data obtained by fluorescence and surface plasmon resonance (spr) indicated that b gpi interacts whit fiia whit physiological affinity (kd nm). kd values calculated by reverting the interacting systems are very similar to each other (kd nm), suggesting that b gpi in the mobile phase has a conformation which is competent for the binding to immobilized fiia. the affinity of fiia for immobilized b gpi is markedly decreased by increased ionic strength (i.e. kd increases by -fold going from . m to . m), suggesting the electrostatic interactions play a key role in fiia -b gpi recognition. filling/inactivation or perturbation of fiia active site does not alter the affinity of fiia for immobilized b gpi, confirming that the active site is not involved in the interaction. mapping of thrombin binding sites with specific exosite-directed ligands (i.e. hirugen, gpibalpha, hd aptamer) and thrombin analogues having the exosites variably compromised (i.e. prothrombin, prethrombin- , alpha-thrombin), reveals that the positively charged exosite-ii of fiia plays a key role in b gpi binding. from the docking model of the bb gpi-thrombin complex, we identified a highly negatively charged segment - in domain v of b gpi interacting with positively charged pathes in thrombin exosite ii. the synthetic peptide b gpi( - ) was able to bind to fiia with an affinity (kd nm) comparable to that of full-length b gpi, deduced from fluorescence or spr measurements and to compete in spr measueremnts with the binding of full-length b gpi to thrombin. hence, combining experimental and theoretical data, we obtained a reliable model of the b gpi-thrombin complex. metalloproteases are one of the most diverse types of proteases, presenting a wide range of folds and catalytic metal ions. in the case of the merops ma clan, where most of the known metalloproteases are grouped based on the consensus hexxh sequence motif, a single catalytic zinc ion and common fold architecture [ ] . despite these common features, members from distinct families present distinct domain composition and topology. given our interest in developing new tailor-made metalloproteases for bioengineering applications, an in-depth understanding of the factors governing their function is required. protein internal dynamics includes the space of functionally-relevant structural changes occurring during an enzymatic reaction, and there is an increasing understanding on how it relates with protein sequence and structure evolution. therefore, we have recently assessed how the structural heterogeneity of metalloproteases relates with the similarity of their dynamical profiles [ ] . first, the dynamical profile of the clan ma type protein thermolysin, derived from the anisotropic network model, was evaluated and compared with those obtained from principal component (pc) analysis of a set of crystallographic structures and essential dynamics (ed) analysis of a ns molecular dynamics simulation trajectory [ ] . a close correspondence was obtained between normal modes (nm) derived from the coarse-grained model and experimentally-observed conformational changes (rmsip between nm -nm and pc of . ), corresponding to functionally-relevant hinge bending motions that were shown to be encoded in the internal dynamics of the protein (cumulative overlap of ed -ed and pc of . ). next, dynamics-based comparison methods that employ a related coarse-grained model (b-gaussian elastic network model) was made for a representative set of ma clan members [ ] , allowing for a quantitative description of its structural and dynamical variability. although members are structurally similar ( % pairs with dalilite z-score > . ), they nonetheless present distinct dynamical profiles ( % of pairs with aladyn p-value > . ), with no identified correlation between structural and dynamical similarity. for cases where high dynamical similarity was observed, the respective modes corresponded to hinge-bending motions encompassing regions close to the active site. further inspection of the produced alignments indicates that for ma clan metalloproteases, conservation of internal dynamics has a functional basis, namely the need for maintaining proper intermolecular interactions between the protein and respective substrate. previously unnoticed dynamical similarity between clan members botulinum neurotoxin type a, leishmanolysin and carboxypeptidase pfu was also found. together, these results suggest that distinct selective pressure mechanisms acted on metalloprotease structure and dynamics through the course of evolution. this work shows how new insights on metalloprotease function and evolution can be assessed with comparison schemes that incorporate additional information of protein dynamics. glucokinase from antarctic psychrotroph pseudoalteromonas sp. as- (psgk) has a higher specific activity at low temperatures and a higher thermal stability than its mesophilic counterpart from e. coli (ecgk). in order to elucidate the structural basis for cold-adaptation and thermal stabilization of psgk, we have determined the crystal structure of psgk at . Å and compared it with the ecgk structure. psgk is a homodimer of the subunit of amino acid residues. each subunit consists of two domains, a small a/b domain (residues - and - ) and a large a b domain (residues - ). the active site is located in a cleft formed between the two domains. the identity of amino acid sequence between psgk and ecgk was %, but three dimensional structures of them are very similar to each other, having the conserved catalytic residues and substrate-binding residues. the analysis of the mainchain temperature factors revealed that the regions of small domain and the hinge region connecting two domains of psgk showed higher temperature factors with a lower number of intramolecular hydrogen bonds and ionic interactions than the corresponding regions of ecgk. however, the large domain regions of psgk showed lower temperature factors with a higher number of intramolecular hydrogen bonds than ecgk. furthermore, the atomic temperature factors of catalytic asp on the small domain were higher, but those of glucose-binding glu , his , and glu on the large domain were lower than ecgk. these results suggest that highly flexible hinge region and the catalytic residue on the small domain of psgk may contribute to its cold-adaptation, namely higher activity at low temperatures, whereas a more rigid structure of the large domain of psgk stabilizes its overall structure more strongly than ecgk. nowadays non-waste technologies in synthetic chemistry become more and more popular. such processes are often carried out using different enzymes. dehydrogenases represent the large group of enzymes, which are widely used in synthesis of chiral compounds and other useful molecules. such enzymes need nadh or nadph as a cofactor and due to high cost of reduced coenzymes a cofactor regeneration system is an obligate part in such kind of processes. it was shown that formate dehydrogenase (fdh, ec . . . .) is one of the best enzymes for nad(p)h regeneration. fdh catalyses the reaction of formate oxidation to carbon dioxide coupled with reduction of nad(p) to nad(p)h. the main advantages of fdh are the irreversibility of catalyzed reaction, low price of formate ion and wide ph optimum of activity. our laboratory has the largest collection of formate dehydrogenases from different sources. many fdh genes from bacteria, yeasts and plants were cloned and enzymes were expressed in active and soluble forms. mutant formate dehydrogenases from bacterium pseudomonas sp. show the highest thermal stability as well as activity in comparison with other reported formate dehydrogenases. now we have focused on eukaryotic genes. the recombinant enzymes from soya glycine max (soyfdh), arabidopsis thaliana (athfdh), moss physcomitrella patens (ppafdh) and yeast ogataea parapolymorpha (opafdh) were obtained by genetic engineering methods. it was revealed, that soyfdh has the best michaelis constants among all known fdhs, but it's less thermally stable compared to other fdhs. new mutant forms of soyfdh with excellent catalytic characteristics and high thermal stability were obtained by protein engineering. other enzymes (athfdh, ppafdh and opafdh) are comparable in their stability with majority of bacterial enzymes (but not with psefdh), so all the new obtained fdhs can be successfully used for cofactor regeneration. marmara university, wellesley college, antibiotics are essential therapeutic drugs widely used in the treatment of bacterial infections. unfortunately, misuse of these drugs resulted in the development of bacterial defense mechanisms. blactamase synthesis is among these mechanisms that renders b-lactam antibiotics ineffective. understanding the dynamic behavior of this enzyme is an important step in controlling its activity. in a former study, the importance of highly conserved w in modulating the hinge type h motion was reported. in the light of this information, mutant tem- b-lactamase enzymes with w a, w f and w y substitutions were constructed. wild-type and mutant tem- b-lactamases purified with ni affinity chromatography were subjected to enzyme assay using centa as the substrate. with w f and w y mutations, the remaining activity was approximately % of the initial activity. however with the w a mutation, activity was totally lost. structural studies of the w a mutant with cd and florescence spectroscopy indicated that there was no major change in the overall structure. however this mutation disrupted the interactions of w which resulted in an increase in the flexibility of this region of the protein. this project was supported by t € ub _ itak project no m . light-switchable zn binding proteins to study the role of intracellular zn signaling stijn aper , maarten merkx zn plays an important catalytic and structural role in many fundamental cellular processes and its homeostasis is tightly controlled. recently, free zn has also been suggested to act as an intracellular signaling molecule. to get increased understanding of the signaling role of zn we are developing light-switchable zn binding proteins to perturb the intracellular zn concentration using light. these protein switches consist of two light-responsive vivid domains and the zn binding domains atox and wd , linked together with flexible peptide linkers. in the dark, zn is tightly bound in between the two zn binding proteins. light-induced dimerization of the vivid proteins disrupts this interaction and thus results in zn release. the fluorescent proteins cerulean and citrine were attached to the vivid domains to allow the different conformational states of the protein switch to be monitored using fret. zn titrations revealed a -fold decrease in zn affinity going from dark-to light-state for the initial design, which was further improved to -fold by optimizing the linkers between the protein domains. in addition, the zn affinities of both states were tuned to be optimal for intracellular applications. switching between the high affinity dark-state and the low affinity lightstate was found to be reversible for at least two light-dark cycles. following the in vitro characterization, we are currently assessing the performance of this genetically encoded 'caged' zn in mammalian cells. proteins as supramolecular building blocks: engineering nanoscale structures school of biological sciences, university of auckland, school of biological sciences, victoria university proteins hold great promise in forming complex nanoscale structures which could be used in the development of new nanomaterials, devices, biosensors, electronics and pharmaceuticals. the potential to produce nanomaterials from proteins is well supported by the numerous examples of self-assembling proteins found in nature. we are exploring self-assembling proteins for use as supramolecular building blocks, or tectons, specifically the n-terminal domain of a dna binding protein (nterm-lsr ) and a typical -cys peroxiredoxin (hsprx ). non-native forms of these proteins have been designed undergo selfassembly into supramolecular structures in a controllable manner. self-assembly of nterm-lsr is initiated via proteolytic cleavage, thereby allowing us to generate supramolecular assemblies in response to a specific trigger. we will show that the degree of oligomerisation can be controlled by variations in environmental conditions such as ph and protein concentration. furthermore, via protein engineering, we have introduced a new "switch" for oligomerisation via enteropeptidase cleavage. the new construct of nterm-lsr can be activated and assembled in a controlled fashion and provides some ability to alter the ratio of higher ordered structures formed. hsprx has been shown to oligomerise into dimers, toroids, stacks and tubes in response to specific triggers such as ph and redox state. in this work we have utilised the histidine tag to further control the assembly of this versatile protein tecton. we will show that minute variations in ph can induce oligomersation of hsprx toroids into stacks and tubes. furthermore, by utilising the histidine tag as a ligand we can bind divalent metals to these supramolecular structures. this not only drives the formation of higher ordered oligomers but also provides a facile route which may facilitate the functionalisation of these protein nanoscale structures after they have been assembled. danielle basore , , rajesh naz , scott michael , sharon isern , benjamin wright , katie saporita , donna crone , christopher bystroff , , biological sciences, rensselaer polytechnic institute, cbis, rensselaer polytechnic institute, chemical and biological engineering, rensselaer polytechnic institute, computer science, rensselaer polytechnic institute, obstetrics and gynecology, west virginia university, unintended pregnancy is a worldwide public health concern, with million pregnancies being classed as unintended in . the magnitude of this number clearly indicates an unmet need in terms of contraception. methods that are currently available are effective, but exhibit many problems. side effects, ease of use, cost, and availability are all concerns. we propose a contraceptive vaccine that would be safe, effective, long-lasting, cheap, and reversible. our vaccine would prevent pregnancy by targeting sperm with antibodies raised in the woman's body. several approaches have been taken to developing a contraceptive vaccine in recent years. the most successful so far has been using human chorionic gonadotropin (hcg), a hormone produced during pregnancy, as an antigen . the hcg vaccine progressed to phase clinical trials, but only displayed an % efficacy, which is insufficient for a contraceptive. our lab uses a structure based approach to the design of an anti-sperm antigenic protein. we believe this will raise a more vigorous immune response that will produce a longer lasting titer. the catsper complex is a heterotetrameric calcium channel found in the tail region of sperm . each subunit of the complex contains an exposed loop known as the p-loop. the p-loop is unique on the surface of sperm because it is not glycosylated, allowing antibodies to potentially recognize and bind it. ylp is a twelve residue peptide that mimics the glycans in the glycocalyx of sperm . ylp is a member of the flitrx library, and in mice, produced protective titers that were reversible both voluntarily and involuntarily. our designs will introduce these two potential antigens into a loop of the l protein of human papilloma virus. l spontaneously assembles into virus like particles, and will aid in the production of a robust immune response. protein carriers for passage of the blood-brain barrier sinisa bjelic medical solutions that help protein therapeutics accumulate into the brain are crucial for future treatment of neurological disorders. biodrugs have a tremendous potential to treat disorders of the nervous system, but their efficiency has been severely restricted. to reach the brain all drugs must traverse the blood-brain barrier (bbb) -a permeable wall that separates blood from the brain -whose main function is to protect the nervous system from environmental influences of bacteria and toxins. unfortunately the bbb is also the culprit that effectively blocks access to therapeutics required for treatment of neurological diseases. a way to boost exposure of therapeutics across the bbb is to piggyback onto the transferrin receptor, a multidomain protein anchored in the membrane, which is involved in the physiological facilitation of iron uptake. here i present research that aims at successfully developing potent protein carriers for transferrin receptor-mediated passage of the bbb by using computational protein design in combination with yeast display methodology for hit validation and optimization. the longterm goal is to couple therapeutics -as for example drugs against alzheimer's -to the designed carriers to increase the brain uptake and cure neurological disorders. medium-throughput multistep purification of coagulation factor viia jais r. bjelke , gorm andersen , henrik Østergaard , laust b. johnsen , anette a. pedersen , tina h. glue there is a need of medium-to-high throughput purification of low-titre recombinant protein variants for screening to identify the final biopharmaceutical lead. such proteins include coagulation factors to be used for treatment of haemophilia and other bleeding disorders. at novo nordisk we have established a platform for production of recombinant coagulation factor viia variants, which include a spectrum of single-point mutations to large domain insertions. the variants were produced using transiently transfected hek f, hkb or choebnalt (qmcf technology) suspension cells. harvest cultivations were typical in the range of . -to l. a -step continuous, multistep purification method was implemented on € aktaxpress systems (ge healthcare). the interlinked process steps include capture using an immunoaffinity column, polish, concentration and buffer exchange using an anion-exchange column and proteolytic activation of the zymogen variant forms using a coagulation factor xaimmobilized column. buffers were designed such that elution from the capture column was aligned with binding conditions on the polish column to avoid a desalting step in-between. the following and final enzymatic activation was optimized with regards to flow rate to ensure full conversion while minimizing unwanted secondary cleavages in factor viia. the final products were fractionated in sharp chromatographic peaks ready for characterization. hplc and sds-page analyses showed a solid quality of the produced variants and more than variants have been produced in sub mg scale using the outlined method. biomimetic sequestration of co : reprogramming the b domain of protein g through a combined computational and experimental approach esra bozkurt , ruud hovius , thereza a. soares , ursula rothlisberger ecole polytechnique f ed erale de lausanne, federal university of pernambuco protein engineering is a powerful tool to generate highly specific enzymes for biomimetic production of chemicals. among many applications, the development of enzymes to accelerate carbon dioxide fixation is a possible route to limit co emission. in this project, we are inspired by the ancient enzyme carbonic anhydrase which efficiently catalyzes the reversible hydration of carbon dioxide in the presence of a zinc ion active site. to create an efficient biocatalyst, the engineered gb domain containing a his cys zn (ii) binding site was used as a starting point. in subsequent work, b domains comprising of his wat zn (ii) binding sites have been rationally designed to produce carbonic anhydrase mimics. the re-engineering was accomplished through a series of mutations to orient the zinc bound reactive species to form a hydrogen bond network in the active site while retaining the native secondary structure. we performed classical molecular dynamics (md), quantum mechanics/molecular mechanics (qm/ mm) simulations and metadynamics, with the aim to explore potential catalytic roles of the reengineered b domains and to elaborate the reaction mechanism. briefly, we introduced novel zn (ii) binding sites into thermostable b domain. in parallel, experiments are underway. wild-type protein was expressed and purified. structural and mutagenesis studies are ongoing. the results emphasize the power of theoretical work to enable the mimicking of nature's enzymes for desired catalytic functions. the roles of entropy and packing efficiency in determining protein-peptide interaction affinities diego caballero , , corey o'hern , , , , lynne regan , , physics, yale university, integrated graduate program in physical and engineering biology, yale university, mechanical engineering and materials science, yale university, applied physics, yale university, molecular biophysics and biochemistry, yale university, chemistry, yale university despite many recent improvements in computational methods for protein design, we still lack a quantitative and predictive understanding of the driving forces that control protein stability, for example, we do not know the relative magnitudes of the side-chain entropy, van der waals contact interactions, and other enthalpic contributions to the free energy of folded proteins. in addition, we cannot reliably predict the effects of point mutations on enzyme specificity or sequence tolerance in ligand binding sites. the tetratricopeptide repeat (tpr) motif is a common and versatile protein system that has been used as a model to study protein-protein interactions. for example, recent studies have experimentally measured the binding affinity and specificity for different tpr binding pockets and peptide ligands and generated a ranking of the protein-peptide pairs with the highest affinity. to gain a fundamental understanding of the interplay between atomic close packing and fluctuations of side-chain conformations in protein-peptide binding pairs, we performed all-atom langevin dynamics simulations of key residues near the binding interface of tpr proteins and their cognate peptides. the langevin dynamics simulations enabled us to calculate the entropy and potential energy of side chain conformations in the presence of backbone fluctuations for each protein-peptide pair. we compile rankings of the stability and affinity of mutant tpr-peptide structures to those obtained from experimental studies. this research has enhanced our ability to rationally manipulate protein-peptide interfaces. advances from this research will enable the design of tpr modules that specifically recognize biologically important proteins. monitoring protein-protein interactions using tripartite split-gfp complementation assays protein-fragment complementation assay (or pca) is a powerful strategy for visualizing protein-protein interactions in living cells. previously described split-gfp based sensors suffer from the poor solubility of individual pca fragments in addition to background signal originating from their spontaneous selfassembly ( ). we developed a new encoded genetic reporter called "tripartite split-gfp" for visualizing protein-protein interactions in vitro and in living cells. the assay is based on tripartite association between two twenty amino-acids long split-gfp tags, gfp and gfp , fused to interacting protein partners, and the complementary gfp - detector. when proteins interact, gfp and gfp selfassociate with gfp - to reconstitute a functional gfp ( ). using coiled-coils and frb/fkbp model systems we characterize the sensor in vitro and in escherichia coli. we extended our studies to mammalian cells and examine the fk- inhibition of the rapamycin-induced association of frb/fkbp . the small size of these tags and their minimal effect on fusion protein behavior and solubility should enable new experiments for monitoring protein-protein association by fluorescence and for screening modulators of complex formation in cell-based assays. aldehyde dehydrogenases (aldhs) catalyze the oxidation of aldehydes to their corresponding acids using nad(p) as coenzyme. these enzymes are responsible for the detoxification of lipid peroxidation products, which have been involved in the etiology and pathogenesis of different diseases involving increments in oxidative stress. recent data from our group, showed that aldh a is resistant to inactivation by lipid peroxidation products, even at concentrations - times higher than those required to inactivate aldh a and aldh . the amino acids sequence of the aldehyde-binding site of the three enzymes was analyzed, and it was found that the enzymes susceptible to the effect of lipid peroxidation products (aldh a and aldh ), have cys residues flanking the reactive cys (position ), based on this criteria and considering that these aldehydes react preferentially with cysteine, a mutant of aldh was generated changing the cys residues adjacent to cys . the mutant aldh -cys thr-cys val, was resistant to the inactivation by acrolein and -hne, even at concentrations -fold higher than those required to inactivate aldh . however, the mutant presented values of km , and -fold higher for acrolein, propionaldehyde and acetaldehyde, respectively, compared to the wild type enzyme, but showed a catalytic efficiency similar to the parent enzyme. these data revealed that cys residues near to the reactive cys in aldh are important in the inactivation process induced by lipid aldehydes, but also participate in determining the specificity for the substrates in this enzyme. small molecule-assisted shutoff: a widely applicable method for tunable and reversible control of protein production h. kay chung , conor jacobs , yunwen huo , jin yang , stefanie krumm , richard plemper , , roger tsien , michael lin department of biology, stanford university, department of pediatrics, stanford university, department of pharmacology, university of california san diego, department of pediatrics, emory university, institute for biomedical sciences, georgia state university, department of chemistry and biochemistry, university of california san diego, howard hughes medical institute, university of california san diego, the ability to quickly control the production of specific proteins would be useful in biomedical research and biotechnology. we describe small molecule-assisted shutoff (smash), a technique in which proteins are fused to a self-excising degron and thereby expressed in a minimally modified form by default. degron removal is performed by a cis-encoded hepatitis c virus (hcv) protease, so that applying clinically available hcv protease inhibitors causes degron retention on subsequently synthesized protein copies and suppresses further protein production. we find that smash allows reversible and dosedependent shutoff of various proteins with high dynamic range in multiple cell types, including yeast. we also successfully use smash to confer drug responsiveness onto a rna virus for which no licensed drug inhibitors exist. as smash does not require permanent fusion of a large domain, it should be useful when control over protein production with minimal structural modification is desired. furthermore, as smash only uses a single tag and does not rely on modulating protein-protein interactions, it should be easy to generalize to multiple biological contexts. top, a protein of interest is fused to the smash tag via a hcv ns protease recognition site. after protein folding, the smash tag is removed by its internal ns protease activity, and is degraded due to an internal degron activity. bottom, addition of protease inhibitor induces the rapid degradation of subsequently synthesized copies of the tagged protein, effectively shutting off further protein production. vaccine development has emerged, epitope-focused immunogens, but in the past these have failed to deliver the expected outcome. here, we employed a new computational design methodology (rosetta fold from loops or ffl) to design epitope-focused immunogens. ffl was devised to insert structurally defined functional sites into protein scaffolds. throughout the ffl stages the structure of the scaffold is folded and its sequence designed to stabilize the desired functional conformation of the inserted site. we used ffl to design epitope-focused immunogens for the respiratory syncytial virus (rsv), for which despite the intense research we are still lacking an approved vaccine. we designed three-helix bundles harboring an rsv epitope, that was previously co-crystallized with the neutralizing antibody motavizumab. the designs were thermodynamically stable (tm > ˚c) and showed extremely high affinities to motavizumab (kd pm). structural characterization through x-ray crystallography of antibodybound and unbound scaffolds showed good agreement to the computational models in the overall structure (rmsd - . Å) and exquisite mimicry of the epitope region (rmsd - . Å), when compared to the peptide-epitope in complex with motavizumab. the designed immunogens were used to immunize non-human primates (nhp), and approximately % of the cohort developed rsv neutralizing activity, in some instances with high potency. to evaluate the therapeutic relevance of the elicited neutralization activity, we compared the nhp neutralization titers to those of human sera after natural rsv infection, which generally yields protective levels of antibodies. the neutralization potency of the best nhp responders was comparable to that of the human sera. to better understand the features of the antibodies elicited, we isolated several rhesus monoclonal antibodies (rhmabs) from the animal that exhibited the most potent neutralization. two of the rhmabs bound to the immunogen with very high affinity (kd pm) and were potent rsv neutralizers. interestingly, these rhmabs were approximately fold more potent than the fda-approved prophylactic antibody palivizumab. our results provide the first proof-of-principle for epitope-focused vaccine design, and demonstrate the power of the ffl figure . schematic of nucleotide binding, exchange and hydrolysis in tubulin, and its coupling to mt assembly. exchange of gdp (orange) for gtp (magenta) at the e-site in b-tubulin (blue) happens in the unpolymerized dimer (left). the active, gtpbound tubulin dimer adds to a growing mt (right). interaction of the incoming a-tubulin (green) with the e-site nucleotide at the plus end of a mt (with b-tubulin exposed) results in gtp hydrolysis. the mt cartoon (bottom right) shows an oversimplified representation of a gtp cap as it first grows by tubulin addition and then shrinks by polymerization-coupled gtp hydrolysis (here b-tubulin that is bound to gtp is shown in red and that bound to gdp is shown in blue). cryo-em density map (emdb- ) and atomic model (pdb: jak) for an eb -decorated mt bound to gtpgs. a-tubulin, b-tubulin and eb are colored green, blue, and orange, respectively. computational methodology. we anticipate that ffl will be useful for a variety of other challenges in the computational design of functional proteins. designed repeat proteins as templates for photoactive molecules and fluorescent nanoclusters sara h. mejias , , antonio aires , , javier l opez-andarias , pierre couleaud , , begoña sot , , carmen atienza , nazario mart ın , , aitziber l. cortajarena , imdea nanoscience, c/faraday, , ciudad universitaria de cantoblanco , cnb-csic-imdea nanociencia associated unit "unidad de nanobiotecnolog ıa", departamento de qu ımica org anica i, facultad de qu ımica, universidad complutense self-assembly of biological molecules into defined functional structures has a tremendous potential in nanopatterning, and the design of novel bionanomaterials and functional devices. molecular selfassembly is a process by which complex three-dimensional structures with specified functions are constructed from simple molecular building blocks. we present first the study and characterization of the assembly properties of modular repeat proteins, in particular designed consensus tetratricopeptide repeats (ctprs), and their application as building blocks in order to generate functional nanostructures and biomaterials. ctpr proteins can be assembled into self-standing thin films, and thin nanometer fibers in solution. in this work, we show the use of the designed consensus repeat proteins as scaffolds to template: ( ) photoactive organic molecules, and ( ) fluorescent nanoclusters. .we explore the potential of ctpr proteins to arrange donor-acceptor pairs for electro-active materials. in particular, porphyrin rings arranged by ctprs in a defined distance and orientation for favoring face-to-face orientation which should lead to an improvement in the optoelectronic properties. our results confirm the successful ability of ctpr proteins to be used as scaffold for ordering organic chromophores, while preserving their structure. the unique self assembly properties of ctpr scaffolds have been exploited to generate ordered conductive films of the protein-porphyrin conjugates. these results open the door to fabricate hybrid protein-based solid devices. .we show results on the ability of ctpr to encapsulate and stabilize fluorescent gold nanoclusters. we investigated the influence of the protein sequence in the final properties of the nanoclusters. the structural and functional integrity of the protein template is critical for future applications of the protein-cluster complexes. therefore synthetic protocols that retain the protein structure and function have been developed. as a proof of concept, a ctpr module with specific binding capabilities has been successfully used to stabilize nano clusters. biohybrid photoelectrochemical cells have been developed by functionalizing the hematite photoanode with the light-harvesting cyanobacterial protein c-phycocyanin (pc) yielding a substantial enhancement of the photocurrent density. photoelectrochemical cells combining light-harvesting proteins and inorganic semiconductors have potential for the use in artificial photosynthesis. in this work we present processing routes for the functionalization of hematite photoanodes with pc, including in situ co-polymerization of pc with enzymatically-produced melanin and using a recombinantly produced pc . moreover, recombinant forms of the light-harvesting protein c-phycocyanin from synechocystis sp. pcc were engineered to carry a peptide with affinity for hematite. similarly, a bacterial laccase was engineered to acquire affinity for hematite. results obtained from the different approaches to hematite functionalization and the advantages offered by protein engineering will be presented. minimizing a suitable free energy expression is arguably the most common approach in (ab initio) protein structure prediction. the achieved accuracy depends crucially on the quality of the free energy expression in use. here, we present corrections to existing free energy expressions which arise from the thermal motion of the protein. we (i) devise a term accounting for the vibrational entropy of the protein, and (ii) correct existing potentials for 'thermal smoothing'. (i) vibrational entropy is almost always neglected in free energy expressions as its consideration is difficult. this practice, however, may lead to incorrect output because distinct conformations of a protein can contain very different amount of vibrational entropy, as we show for the chicken villin headpiece explicitly [ ] . for considering vibrational entropy, we suggest a knowledge based approach where typical fluctuation and correlation patterns are extracted from known proteins and then applied to new targets. (ii) at ambient conditions, timeaveraged potentials of proteins are considerably smoothened due to thermal motion where the strength of this effect varies strongly between atoms. distinguishing these inhomogeneities by introducing new atom species regarding their locale environment can therefore increase the precision of time-averaged potentials [ ] . extraction of general principles from the continually growing protein data bank (pdb) has been a significant driving force in our understanding of protein structure. atomistic or residue-level statistical potentials, secondary-structural propensities, and geometric preferences for hydrogen bonding are among the classical insights that arose from observations in the pdb. given the magnitude of structural data available today, it is likely that many quantitative generalizations remain to be made. here we hypothesize that the pdb contains valuable quantitative information on the level of local tertiary structural motifs (terms), with term statistics reflecting fundamental relationships between sequence and structure. we define a term to be the structural fragment that captures the local secondary and tertiary environments of a given residue, and put our hypothesis through a series of rigorous tests. first, we show that by breaking a protein structure into its constituent terms, and querying the pdb to characterize the natural ensemble around each, we can estimate the compatibility of the structure with a given amino-acid sequence through a metric we term "structure score." considering submissions from recent critical assessment of structure prediction (casp) experiments, we find a strong correlation (r . ) between structure score and model accuracy, with poorly predicted regions readily identifiable. this performance exceeds that of leading atomistic statistical energy functions. next, we show that by considering the terms of a structure that are affected by a given mutation, and mining the pdb to characterize sequence statistics associated with each, we are able to predict mutational free energies on par with or better than far more sophisticated atomistic energy functions. finally, we ask whether term statistics are sufficient to enable the design of proteins de-novo. we demonstrate that given a native backbone conformation, term considerations alone with no input from molecular mechanics correctly predict roughly the same fraction of amino acids from the corresponding native sequence as state-ofthe-art computational protein design methods. knowledge-based energy functions have already put pdb statistics to good use by parsing structural environments into geometric descriptors, generally assuming their conditional independence. our results suggest that it may now be possible to instead consider local structural environments in their entirety, asking questions about them directly. if this is the case, then the pdb is an even larger treasure trove of information than it has been generally known to be, and methods of mining it for term-based statistics should present opportunities for advances in structure prediction and protein design. comprehensive understanding of a protein fold is intertwined with successful design. recent advances in designing de novo structures have shown that proteins can be designed for a few globular and helical folds. however, designing all-b structures and barrels remains challenging because loops and intricate long range interactions that are important in these topologies are difficult to control. for designing novel catalysts, the (a/b) -barrel (or tim-barrel) fold is one of the most important examples, for it is the most common topology for enzymes. for almost year, attempts in designing de novo tim barrel structures have all resulted in poorly folded proteins. here we describe the successful design of a -fold symmetrical (a/b) barrel directly from geometrical and chemical principles. designed variants with a wide range of stabilities from being molten globules to cooperatively folded proteins were experimentally characterized, and the results revealed the importance of sidechain-backbone hydrogen bonding for defining the characteristic a/b-barrel. the residue tim barrel structure is among the smallest tim-barrels and has a fully-reversible melting temperature of c. the x-ray crystal structure shows atomic-level agreement with the design model. despite this structural similarity, psi-blast searches do not identify sequence similarities to known tim-barrel proteins. more sensitive profile-profile searches suggest that the design is sufficiently distant from other native tim-barrel superfamilies to be in a superfamily of its own, further implying that nature has only sampled a subset of the sequence space available to the tim-barrel fold. the ability to de novo design tim-barrels opens new possibilities for custom-made enzymes. university of texas southwestern medical center, biofrontiers institute, university of colorado creation of new molecular sensors and actuators based on fluorescent proteins relies on methods for identifying complex photophysical phenotypes and subsequently performing separations on cell populations. we developed a microfluidic flow cytometry approach tailored to interrogating the performance of genetically-encoded fluorophores and present the results of studies employing this technology. the system screens cell-based libraries on the basis of multiple photophysical parameters relevant to imaging, including brightness, photostability, and excited-state lifetime (i.e. a proxy for fluorescence quantum yield) at a rate of up to cells/sec. in a first generation of experiments, molecular dynamics-guided design was used to create a library of mcherry mutants that was screened with this system, resulting in the identification of a variant with a higher stability b-barrel and improved photostability but with a decreased brightness due to reduction in the fluorescence quantum yield. to avoid inadvertent decreases in this important performance criterion, subsequent rounds of selection were performed on the basis of both photostability and excited-state lifetime as sorting criteria. in these second generation selections, mutations were designed to target pathways of oxygen access through the bottom of the bbarrel in addition to a position that directly interacts with the chromophore. furthermore, subsequent rounds of screening were used to improve folding and maturation. the multiparameter sort identified multiple clones with up to -fold improved photostability and up to double the excited-state lifetime of the parent mcherry fluorescent protein. the best mutant we identified produces one order of magnitude more photons before photobleaching compared to mcherry, at excitation conditions characteristic of confocal fluorescence microscopy. our results demonstrate the utility of combining moleculardynamics-guided library design with technology for photophysics-based selections. we anticipate that the new fluorescent proteins obtained in this work will find use in low-copy-number and long-duration imaging live cell imaging applications in cell-lines created by genomic editing techniques. targeted protein degradation achieved through a combination of degrons from yeast and mammalian ornithine decarboxylase rushikesh joshi , ratna prabha c. the maharaja sayajirao university of baroda targeted protein degradation achieved through a combination of degrons from yeast and mammalian ornithine decarboxylase targeting the over accumulated protein in the cell for degradation using specific degrons is an emerging research area. the degradation of the vast majority of cellular proteins is targeted by the ubiquitin-proteasome pathway. but in the case of ubiquitin independent protein degradation, odc/az system is more effective in achieving targeted protein degradation than other types of degradation . ornithine decarboxylase (odc) is key regulatory enzyme in the biosynthesis of polyamines. the protein has two domains namely, n terminal a/b barrel domain and c-terminal b-sheet domain. degradation of odc is mediated by polyamine inducible protein, antizyme (az). antizyme interacts with odc on n-terminal region, which results in degradation of odc by proteasomes. in mammalian odc the c-terminal has an unstructured tail of residues, which pulls odc into proteasome for degradation. it was reported earlier by coffino's group that the unstructured tail acts as a degron in chimeric fusion with gfp . in yeast, same function is achieved by n-terminal residues . present study focuses on accomplishing targeted protein degradation in saccharomyces cerevisiae by adding these two degradation signals or degrons of yeast odc and mammalian odc as tags to a reporter protein. we have selected two degrons namely, n terminal a/b barrel domain of yeast odc and c-terminal residues of mouse odc and grafted them to n and c-terminus of the reporter protein yegfp. degradation of yegfp and yegfp fusion with degrons of odc (degron-yegfp) were monitored by western blot using anti-gfp antibody and fluorescence spectroscopy. initially, the amount of degron-yegfp fusion protein was very low compared to control yegfp. it means that the chimeric protein underwent rapid degradation in the cells. after inhibition of proteasome, increase in the level of degron-yegfp was observed, confirming that the degrons cause rapid degradation of reporter protein through proteasome. earlier, we have also tagged ubiquitin from yeast with last residues of modc and observed enhanced degradation of ubiquitin in saccharomyces cerevisiae. therefore, both the degrons of odc alone and in combination are capable of decreasing stability of reporter protein in the cells. however, the combination of degrons is more effective than either of them in isolation. enzymes fold into unique three-dimensional structures, which underlie their remarkable catalytic properties. the requirement that they be stably folded is a likely factor that contributes to their relatively large size (> , dalton). however, much shorter peptides can achieve well-defined conformations through the formation of amyloid fibrils. to test whether short amyloid-forming peptides might in fact be capable of enzyme-like catalysis, we designed a series of -residue peptides that act as zn dependent esterases. zn helps stabilize the fibril formation, while also acting as a cofactor to catalyze acyl ester hydrolysis. the fibril activity is on par with the most active to date zinc-protein complex. such remarkable efficiency is due to the small size of the active unit (likely a dimer of -residue peptides), while the protein is at least -fold larger in molecular weight. the observed catalytic activity is not limited to ester hydrolysis. we have designed copper binding peptides that are capable oxygen activation. these results indicate that prion-like fibrils are able to not only catalyze their own formation -they also can catalyze chemical reactions. thus, they might have served as intermediates in the evolution of modern-day metalloenzymes. these results also have implications for the design of self-assembling nanostructured catalysts including ones containing a variety of biological and nonbiological metal ions. rational design of the cold active subtilisin-like serine protease vpr with improved catalytic properties and thermal stability abstract proteinase vpr, from a psychrophilic vibrio species and its thermophilic structural homologue, aqualysin i (aqui) from thermus aquaticus, we set out to design a mutant of vpr which would be more thermostable, but would retain the high catalytic activity of the wild type enzyme. our starting protein template was a previously stabilized mutant containing two inserted proline residues close to the nterminus of vpr (n p/i p). this vpr_n p/i p mutant was shown to have a significantly increased thermal stability but displayed a concomitant tenfold loss of catalytic efficiency. from our previous studies we selected two mutations, one which increased catalytic activity (q k) of the enzyme significantly and another which stabilized the protein against thermal denaturation (n d). the n d mutation had been shown to introduce a salt bridge into the structure of the cold adapted proteinase, yielding higher stability but without negative effects on activity. the q k exchange had been shown to double the turnover number (kcat) to that of the wild type enzyme. insertions of these selected mutations into the vpr_n p/i p mutant were according to predictions; the q k increased the kcat tenfold, and the n d mutation increased the thermal stability. in the combination mutant, vpr_n p/i p/n d/q k, thermal stability was increased by c and c, in terms of tm and t %, respectively. furthermore, the catalytic activity of the mutant was somewhat higher than that of the wild type enzyme. critical peptide stretches may not serve as faithful experimental mimics for protein amyloidogenesis bishwajit kundu , dushyant garg certain amino acid stretches are considered critical to trigger the amyloidogenesis in a protein. these peptide stretches are often synthetically produced to serve as experimental mimics for studying amyloidogenesis of the parent protein. here we provide evidence that such simple extrapolation may be misleading. we studied the amyloidogenesis of full length bovine carbonic anhydrase ii (bcaii) and compared it with those formed by its critical amyloidogenic peptide stretch - (pepb). under similar solution conditions and initial monomeric concentrations, we found that while amyloid formation by bcaii followed aggregation kinetics dominated by surface-catalyzed secondary nucleation, pepb followed classical nucleation-dependent pathway. the afm images showed that bcaii forms short, thick and branched fibrils, whereas pepb formed thin, long and unbranched fibrils. atr-ftir revealed parallel arrangement of cross b sheet in bcaii amyloids, while pepb arranged into antiparallel b sheets. amyloids formed by bcaii were unable to seed the fibrillation of pepb and vice versa. even the intermediates formed during lag phase revealed contrasting ftir, far uv cd signature, hydrophobicity and morphology. we propose that for any polypeptide, the sequences flanking a critical region are equally effective in modulating the initial nucleation events, generating prefibrillar and finally fibrillar species with contrasting characteristic. the results have been discussed in light of amyloid polymorphism and its importance in the design of therapeutic strategies targeting such toxic regions. aksana labokha , ralph minter all approved biological drugs target extracellular proteins and not the majority of the expressed human genome, which resides within intracellular compartments. included in the latter category are many important, disease-relevant targets which cannot be easily addressed by small molecule approaches, such as the oncology targets c-myc and k-ras. although bacteria and viruses have evolved strategies to deliver biological material to the cell cytoplasm and nucleus, our ability to engineer recombinant proteins to replicate this is somewhat limited by (i) our nascent understanding of protein uptake and trafficking pathways and (ii) the ability to easily quantify cell delivery to the cytoplasm and cellular organelles. the aim of my project is to address these challenges by developing an effective assay for cytoplasmic uptake and then using it to measure the delivery efficiency of recombinant proteins which mimic natural delivery strategies e.g. cell penetrating peptides fusion, exotoxin mimics, and supercharged proteins (proteins with high surface charge which can enter cells). i also intend to explore the influence of the rab superfamily, which are the master regulators of protein trafficking, to influence and control both the kinetics and final subcellular destination of exogenous proteins. protein engineering: what's next? with the growing industrial need for engineering enzymes for the deconstruction and transformation of plant biomass in biorefineries, there is a want for the development of new approaches for designing special purpose biocatalysts. techniques, such as directed evolution, which mimic the natural selection process by evolving proteins towards the improvement of a given property, have unquestionably demonstrated their value and are routinely used in large industrial companies. nevertheless, the brute force employed in these methods, could significantly gain from an all-atom description of the underlying catalytic mechanisms, to center the efforts on more limited areas of the protein. in the last years, we have developed computational tools, which combine the electronic structure description of qm/mm methods with the potential to model long time scale processes of pele, to study the details of a variety of reactions. examples, which will be discussed, include rationalizing the selective oxyfunctionalization of steroids using fungal enzymes and the study of the effect of point mutations on the oxidation efficiency of laccases. these methods have shown their potential not only at the descriptive level but, more importantly, through their high predictive capability that opens many opportunities for their use in biotechnology. in this talk, we will show how recent advances in in silico approaches are setting new grounds for future computer guided directed evolution. several orthogonal bioreactions take place simultaneously within membrane bound organelles in eukaryotes and proteinaceous microcompartments in bacteria. these subcellular structures contain sets of enzymes co-involved in metabolic pathways. towards the goal of creating artificial protein microreactors, we seek to develop an artificial organelle that emulates the metabolic activity of the carbon fixating organelle of autotrophic bacteria, the carboxysome. here, we show that the two key carboxysomal enzymes, ribulose- , -bisphosphate carboxylase/oxygenase (rubisco) and carbonic anhydrase (ca), can be efficiently co-encapsulated using our previously reported encapsulation system which is based on a bacterial capsid formed from the protein lumazine synthase (aals- ). our preliminary results suggest that the enzymes can act in tandem and that the co-encapsulation of ca with rubisco in the capsid is necessary for enhanced rubisco activity in vitro. we attribute this observation to the high local concentrations of the rubisco substrate, co , produced by ca within the capsid. we are developing a theoretical model of a minimal carboxysome using the kinetic rate constants of our rubisco and ca variants and aals- as the shell to complement these experiments. next, we will incorporate our minimal carboxysome within an expression host such as e.coli, opening up the possibility of further optimization through directed evolution. in the past targeting and engineering of chemokines has led to several interesting drug candidates. [ ] amongst them, met-rantes, a met-ccl with high g protein-coupled receptor (gpcr) affinity but no subsequent signal transduction, as well as mutants addressing the interaction with the so-called glycosaminoglycans (gags) seem to be the most promising candidates. both, gag knockout as well as gag affinity matured chemokine isoforms have been considered as anti-inflammatory drug candidates, out of which an il- mutant with modifications reached clinical phase where it was profiled for acute neutrophil-related exacerbation in copd. [ ] cxcl (ip- ) is a proinflammatory chemokine released by various cells following stimulation by interferon g (ifn-g) . it is therefore considered as a late chemokine being responsible for the attraction of different lymphocytes. [ ] any therapeutic indication is consequently related to chronic and multiple applications. we have therefore engineered cxcl very conservatively at positions to ultimately generate dominant-negative mutants with a mildly improved gagbinding affinity and an entire knock off gpcr activity. the first steps of our engineering approach were in silico modelling of the mutants and the establishment of a suitable upstream-and downstreamprocessing protocol. next we generated a fluorescently engineered cxcl variant for our fluorescence-based affinity studies which was subjected to biocomparability investigations relative to the native, non-fluorescent protein. compared to the wild type, the fluorescently engineered mutant exhibited similar biological, chemotactic and gag-binding properties. next we started to produce sufficient amounts of the members of our nascent mutant library which were tested with respect to their biophysically behavior as well as to their knocked out chemotactic potency on cells. these experiments included gel electrophoresis and western blot analysis to determine identity and purity; circular dichroism (cd) and chaotrope-induced unfolding to approximate structure; isothermal fluorescence titration (ift); surface plasmon resonance (spr) and isothermal titration calorimetry (itc) to quantify gagbinding affinity and boyden chamber experiments to determine the chemotactic activity. our results show that we are able to tune the gag binding strength along with the gpcr activity of human cxcl which could lead to therapeutic applications in the future. nanodiscs are composed of a nanometer-sized phospholipid bilayer encircled by two a helical, amphipathic membrane scaffold proteins (msps). these particles provide a unique detergent free lipid bilayer model enabling biochemical and biophysical characterization of membrane proteins in a physiologically relevant medium. previously, the largest diameter reported of a nanodisc assembled using msps was about - nm. here we present a method to create large nanodiscs (up to nm in diameter) assembled with covalently circularized msps (cmsp). we can observe the homogeneity in nanodiscs diameter as a narrow distribution using negative-stain em. using our method, we have created nm nanodiscs and used them to study poliovirus ( nm diameter) entry and rna translocation. a nm nanodisc is sufficiently large to accommodate multiple copies of the cd receptor (also known as the poliovirus receptor), and has enough surface area to act as a surrogate membrane for the rna translocation complex during viral uncoating. the nm nanodiscs functionalized with the his-tagged ectodomain of poliovirus receptor, cd , were generated by adding lipids derivatized with a nta nickel- chelating head group to the lipid mixture during nanodisc assembly. cd receptor was added to the already assembled nanodiscs and incubated for minutes at room temperature. the receptordecorated nanodisc complex was purified by size exclusion chromatography. the purified complex was then incubated with poliovirus for minute at c, and then heated to c for minutes to initiate receptor-mediated viral uncoating. virus binding to nanodisc-cd complex and subsequent insertion of viral components into and across the membrane were confirmed by negative-stain electron microscopy (figure c) . to obtain a high-resolution structure for the rna translocation complex we conducted single-particle cryo-em studies using a polara f microscope. unlike liposomes, generating a reconstruction of samples containing nanodiscs is less complicated since the nanodiscs are more homogenous in size, and allow for thinner ice. also, the viral rna can be visualized more easily. the method for making large nanodiscs as well as the negative stain and cryo-em data will be will be presented and discussed. parametric design of alpha-helical barrels and pore-like assemblies with very high thermodynamic stabilities computational design of novel protein structures and enzymes with new functions is a promising tool to create superior biological materials with tailor-made properties, new pharmaceuticals, complex fine chemicals or renewable fuels. it also challenges our understanding of protein folding, protein evolution, molecular recognition and catalysis. here we present a procedure for designing proteins with backbones produced by varying the parameters in the crick coiled-coil generating equations [ ] . combinatorial design calculations using the software suite rosetta identify low energy sequences for alternative helix supercoil arrangements. after that, loop modeling is applied to connect the designs with lowest energy. the extent to which the designed sequences encode the designed structures is evaluated using large-scale structure prediction calculations, as well as symmetric and asymmetric protein-protein docking calculations. subsequently, synthetic genes are generated for sequences that converge strongly on the designed structure for experimental characterization. we applied this approach to monomeric three and four helical bundle structures as well as a pentameric five-helix bundle structure using idealized coiled-coil geometries [ ] . recently we expanded this approach to higher complexity backbones, which resulted in the de-novo design of monomeric, antiparallel six-helix bundles with untwisted, left-and right-handed geometries. circular dichroism (cd), size-exclusion coupled multi-angle light scattering measurements (sec-mals), negative stain electron micrographs (em) and small angle x-ray scattering (saxs) of these designs suggest that they indeed form the designed structures. in addition, we used rosetta protein-protein interface design functionality to computationally design oligomers out of our previously published three and four helix bundle structures to generate self-assembling pore-like structures with the potential use as channels or transporters. again, experimental validation of these designs by cd, sec-mals, em and saxs show that the designs are correct. we are currently undertaking further structural investigation of all these designs by x-ray crystallography. the designs described above can act as templates for protein or small molecule binding, holding a catalytic machinery or for scaffolding enzymes in reaction cascades. some of these applications are currently under investigation, including a self-sufficient redox system employing two copper-centers, binding of heme-moieties as a prosthetic group and tailoring the pore-like geometries to be used in nanopore sequencing. university of washington, university of california, san francisco, repeat proteins are an example of how evolution proceeds by building on existing structures and functions, but also a source of modular protein scaffolds for molecular recognition and biomaterials. however, it is unclear whether the limited number of folds and families that we know today is the result of the intrinsic limitations of polypeptide chains or the consequence of the path followed by evolution. we explored this hypothesis by computational design of repeat proteins based on modular units formed by two alpha helices and two loops of variable lengths, without relying on information from available repeat protein families. the automated sampling of the conformational space resulted in a large number of architectures from which de novo designs were selected for experimental characterization. % of the proteins were stable up to c and monodisperse and designs were structurally validated by small angle x-ray scattering. crystal structures were solved for of them, with root mean square deviation from the models between . Å and . Å. the designs differ from known proteins both at the sequence and structure levels and cover a broader range of geometries than observed in naturally occurring repeat protein families, indicating that existing architectures represent only a small fraction of what can be achieved. our results show that it is possible to expand the range of repeat protein architectures beyond the naturally occurring families, and that computational design can provide new scaffolds and enable the design of proteins tailored for specific applications. the serpin family of proteins consists of over members, all with a highly conserved native structure that is metastable ( ). serpins use this metastability to control the activity of proteases, via a specific inhibitory process. the serpin binds to its target protease through specific residues within the reactive centre loop, the protease cleaves the loop and results in a large conformational change causing the protease to become distorted and catalytically inactive whilst the serpin becomes much more stable ( , , ) . the metastable nature of aat is therefore required to facilitate the rapid and gross conformational changes required for its inhibitory function ( , ) . several disease-causing mutants of aat have been identified, the most common of them being the z-variant ( ). the z-variant has an increased propensity to polymerize in the endoplasmic reticulum of hepatocytes leading to cell death and liver damage ( ) . during the past fifteen years, many groups have unsuccessfully screened a number of serpins and a vast range of solution conditions to identify a combination of serpin and conditions that will enable the folding reaction of a serpin to be characterized. we have now taken an alternative approach and designed a synthetic "model" serpin that folds reversibly to its native state. in order to do this, we used a consensus design approach, analysing a sequence alignment of serpin sequences and determining the prevalent amino acid residue at each position, we termed this serpin conserpin (consensus serpin). here we present the structural, biophysical and functional characterisation of conserpin. combined crystallographic and folding studies reveal the characteristics of conserpin that likely dictate its unique stability and folding behaviour, whilst retaining activity as a serine protease inhibitor. the development of enhanced protein binding scaffolds is a key for engineering protein inhibitors and biosensors with advanced characteristics. utilizing the structural variability and designability of repeat proteins offers a means for designing protein binders where the overall shape is customized to optimally match a target molecule. we developed a computational protocol for the design of repeat proteins with a predefined geometry. by combining sequence optimization of existing repeats and de novo design of capping structures, we designed leucine-rich repeat (lrr) proteins where the building blocks assemble into a novel structure. the suggested design procedure was validated by engineering an artificial donut-like ring structure, which is constructed from ten self-compatible repeats. characterization of several designed constructs further suggests that buried cysteines play a central role for stability and folding cooperativity in certain lrr proteins. this effect could provide a means for selectively stabilizing or destabilizing specific parts of an lrr-based protein binder. the computational procedure may now be employed to develop repeat proteins with various geometrical shapes for applications where greater control of the interface geometry is desired. engineering apobec g enzymes for altered specificity and processivity louis scott , muhammad razif , aleksandra filipovska , , oliver rackham , harry perkins institute of medical research, school of chemistry and biochemistry, the university of western australia apobec g (a g) is a host-encoded protein involved in the defense against hiv- and other retroviral infections. a g is a cytidine deaminase with a ' to ' processive nature, causing targeted c to t mutations along a dna strand. the catalytic and processive activity of a g leads to the hypermutation of nascent retroviral cdna, resulting in premature termination codons and dysfunctional proteins. ultimately, the action of a g inhibits viral replication. the ability of a g to jump and slide along a dna strand, deaminating at targeted sequences, makes it an interesting candidate for protein engineering. engineered a g enzymes for increased activity, altered specificity, and altered processivity are attractive options for expanding the dna modifying enzyme toolbox. mutation of catalytic residues, residues thought to affect its processive nature and those thought to be involved in target recognition, can create novel a g enzymes. using structure guided selection, residues in key functional sites that are amiable to mutation will be chosen. individuals from the resulting libraries of mutants will be selected by directed evolution for desired characteristics. the resulting a g enzymes will be examined for the relationship between their structure and function. such engineered a g enzymes could be targeted to catalyse the reversion of deleterious genetic mutations. furthermore, engineered a g enzymes could be used in mutational studies that call for targeted deamination along a dna strand, or mutational studies that call for unspecific and high throughput dna deamination. engineering porous protein crystals as scaffolds for programmed assembly thaddaus huber , luke hartje , christopher snow a key motivation for nano-biotechnology efforts is the creation of designer materials in which the assembly acts to organize functional domains in three dimensions. crystalline materials are ideal from the validation perspective because x-ray diffraction can elucidate the atomic structure. relatively little work has focused on engineering protein crystals as scaffolds for nanotechnology, due to the technical challenges of coaxing typical proteins into crystallizing, and the likelihood of disrupting the crystallization process if changes are made to the monomers. we have circumvented these limitations by installing guest protein domains within engineered porous crystals ( nm pore diameter) that have been rendered robust using covalent crosslinks. the retention of the scaffold structure despite changes to the solution conditions and macromolecule uptake can be validated through x-ray diffraction. we have engineered scaffold crystals for the non-covalent and covalent capture of guest macromolecules. by controlling the reversible loading and release, we can prepare "integrated" crystals with spatially segregated guest loading patterns. as assessed using confocal microscopy, such host-guest crystals are highly stable. ultimately, the resulting crystals may serve as a robust alternative to dna assemblies for the programmed placement of macromolecules within materials. engineering ultrasensitive protein probes of voltage dynamics for imaging neural activity in vivo francois st-pierre , , michael pan , , helen yang , xiaozhe ding , , ying yang , , thomas clandinin , michael lin , department of bioengineering, stanford university, department of pediatrics, stanford university, nervous systems encode information as spatiotemporal patterns of membrane voltage transients, so accurate measurement of electrical activity has been of long-standing interest. recent engineering efforts have improved our ability to monitor membrane voltage dynamics using genetically encoded voltage indicators. in comparison with electrophysiological approaches, such protein-based indicators can monitor many genetically defined neurons simultaneously; they can also more easily measure voltage changes from subcellular compartments such as axons and dendrites. compared with genetically encoded calcium indicators, voltage sensors enable a more direct, accurate, and rapid readout of membrane potential changes. however, several challenges remain for in vivo voltage imaging with genetically encoded indicators. in particular, current voltage sensors are characterized by insufficient sensitivity, kinetics, and/or brightness to be true optical replacements for electrodes in vivo. as a first step towards addressing these challenges, we sought to develop new voltage indicators that further improve upon the performance of the fast voltage sensor accelerated sensor of action potentials (asap ). in asap , voltage-induced conformational changes in a natural voltage-sensing domain perturb the fluorescence emission of a covalently linked green fluorescent protein (gfp). using a structurebased approach to guide mutagenesis, we discovered several amino acids that tune the kinetics and voltage sensitivity of asap . these residues are not only located in the voltage-sensing domain, but also in the fluorescent protein and in the linkers bridging sensing domain and gfp. our most improved variant, asap , exhibits improved sensitivity to voltage transients such as neuronal action potentials and subthreshold depolarizations. we sought to characterize the ability of these new voltage sensors to monitor neural activity in vivo using laser-scanning two-photon microscopy, a technique that allows imaging with lower autofluorescence and deeper tissue penetration. we report that asap sensors were able report stimulus-evoked voltage responses in axonal termini of the fly visual interneuron l . asap sensors enabled voltage imaging with dramatically improved temporal resolution compared to three recently reported calcium and voltage sensors. overall, our study reports novel voltage indicators with improved performance and highlights how specific amino acids can tune the performance of a proteinbased fluorescent sensor. we anticipate that these results will pave the way for further engineering of voltage sensing proteins, and that our new sensor asap will facilitate current and future efforts to understand how neural circuits represent and transform information. assembly of armadillo repeat proteins from complementary fragments erich michel , randall watson , martin christen , fabian bumback , andreas pl€ uckthun , oliver zerbe demonstrated that complementary fragments of a designed consensus armadillo repeat protein (armrp) recognize each other [ ] . the two fragments ym : ma, in which y, m and a denote the n-cap, internal repeats and the c-cap, respectively, form a : complex with a nanomolar dissociation constant, which is essentially identical to the crystal structure of the continuous ym a protein. we further demonstrate that structurally intact armadillo repeat protein complexes can be reconstituted from fragments obtained at various split sites -essentially after every repeat but also within repeats. the fragments display variable affinities towards each other, depending on the split site. the low affinity of some complementary pairs can be dramatically increased upon addition of peptide ligands. while a number of proteins are known that can be reconstituted from fragments we believe that the fact that armadillo repeat proteins can be reconstituted from various complementary fragments is novel and opens new interesting perspectives and applications in biochemistry. a reliable method for generating optically controllable proteins would enable researchers to interrogate protein functions with high spatiotemporal specificity. we recently engineered a tetrameric fluorescent protein, dronpa n, that undergoes light-induced monomerization, then developed a general architecture for lightinducible proteins based on this light-induced transition. we created proteins whose active sites were blocked by fused dronpa n domains in the dark, but would become unblocked by light. here we present further two extensions to this concept that together enabled the generalization of this method to additional classes of proteins. first, we engineered a photodissociable dimeric dronpa (pddronpa) with tunable affinity, faster photoswitching speed, and decreased level of protein aggregation, enabling better performance of fusion proteins. second, we introduce the concept of caging a protein active site by insertion of dronpa domains into loops rather than strictly at the protein termini. we use the pddronpa system to impose optical control on kinases and the cas endonuclease. the resulting light-inducible mek kinase, raf kinase, and cas endonuclease showed high caging efficiency of protein activities in the dark, and robust protein activation upon light illumination. we believe that our efforts on further improving and generalizing this method would bring the power and benefits of light control to a broad community of biologists. exploring the evolution of folds and its application for the design of functional hybrid proteins saacnicteh toledo patiño , birte h€ ocker the structural diversity of proteins may appear endless, nevertheless even large protein complexes can be decomposed into protein domains and smaller sub-domain sized fragments. only recently, we could identify such fragments employing sequence-based comparisons of different folds, as the tim-barrel and the flavodoxin-like fold (farias-rico et al., ) . as an extension of this work, we compared all a/b proteins and identified several fragments shared by different folds illustrating how nature may have achieved structural and functional diversity from a reduced set of building blocks. inspired by this combinatorial concept, we searched for homologous fragments bearing active sites to engineer a functional fold-chimera. we extracted the vitamin-b binding part from methylmalonyl coa mutase, which belongs to the flavodoxin-like fold (fl) and used it to replace the corresponding fragment in uroporphyrinogen iii synthase, which belongs to the hemd-like fold (hdl). the new hybrid resulted in a stable and well-folded protein whose structure was determined by x-ray crystallography. moreover, cobalamin-binding function was successfully transferred to the new protein from the fl parent, which shows the advantage of using this approach for the design of new functional proteins. in addition, profile alignments revealed sequence and structural evidence that suggested an evolutionary path for hdl from fl by gene duplication. to test this hypothesis, we expressed a modified c-terminal half of uroporphyrinogen iii synthase and solved its structure by nmr spectroscopy, thereby confirming the predicted fl architecture. altogether, our approach facilitates the detection of common ancestry among different folds contributing to our understanding of protein development. furthermore, our results show how new complex proteins can be designed using fragments of existing proteins that serve as building blocks in a lego-like manner. we believe that combining fragments containing existing properties will provide a successful method for the design of novel functionalities in the future. [ ] . the active site cysteine plays a key role in the reaction mechanism and we investigated this residue in more detail by exchanging this moiety with selenocysteine (sec) and homocysteine (hcy). the sortase mutants were generated by semisynthesis using expressed protein ligation (epl). the resulting cys-, sec-and hcy-sortase enzymes were characterized and showed a moderate - -fold reduction of activity for sec-sortase. the activity of hcysortase was barely detectable with less than % of wildtype activity. the alkylation efficiency of the active site nucleophiles correlated with the expected pka values of sec, cys and hcy. analysis of the ph dependency of the transpeptidation reactions showed that the activity optimum of sec-sortase was shifted towards more acidic conditions. these investigations provide further insights into the reaction mechanism of sortase a and the semisynthetic enzymes may provide new tool for further biochemical studies. propanediol oxidoreductase from escherichia coli (fuco) uses nadh/nad as cofactors to catalyze the conversion of s-lactaldehyde to s- , -propanediol and vice versa. fuco is an attractive enzyme in the search for possible biocatalysts producing a-hydroxy aldehydes, which are important for the synthesis of natural products and synthetic drugs. enzymes catalyzing these types of reactions are unique in catalytic power and stereoselectivity. the usage of fuco in synthetic industry is limited by the restricted substrate scope, which makes fuco inactive with larger phenyl-substituted alcohols. we used reengineering and directed evolution to enable fuco to catalyze the regio-and enantioselective oxidation of arylsubstituted vicinal diols, such as phenylpropanediols, into a-hydroxy aldehyde products. we mutated amino acids considered to restrict the entry into the active site, and modeled the mutants that were most active with the substrates phenylacetaldehyde and s- -phenyl- , -propanediol and performed docking studies with them. as expected, our experimental and in silico results show that the mutations enlarge the active site cavity and enable the mutant enzymes to accommodate the new substrates. we also found specific amino acids in the active site, which need to be conserved to allow the substrates to make stabilizing interactions. interestingly, an asparagine residue makes the mutant enzymes able to discriminate between phenylacetaldehyde and s- -phenyl- , -propanediol. in conclusion, we successfully re-engineered the specialist enzyme fuco to accept also bulkier molecules as substrates, thereby making it more useful for industrial purposes. one way to gain insight into the sequence-structure-function relationship in proteins is to de novo design artificial proteins. despite impressive successes in de novo protein design, designing a folded protein of more than amino acids still remains a challenge. using this approach, an idealized (beta/ alpha) fold protein was designed leading to the production of a protein of amino acids (octarellin v). this protein showed a low solubility and stability. through directed evolution we produced a soluble variant, octarellin v. . the biophysical characterization of octarellin v. shows a well folded monomeric and thermostable protein with a tm over c. however, after several screenings, we could not find crystallization conditions for this protein. as an alternative, we decided to co-crystallize octarellin v. with a protein partner that helps the crystallization process. we used protein partners: alpha-reps and nanobodies. the first one is characterized to interact through a large surface contact, whereas the second is characterized to recognize an specific small epitope. crystallization of both complexes was performed successfully by vapor diffusion and the structures were solved. the experimental structures correspond to the first for an artificial protein of this size and it will allow to criticize the computational design of the octarellin v. generation of synthetic antibodies against membrane proteins in nanodiscs for use in structural biology methods. here, we describe a robust strategy for generating a class of high performance antibodybased affinity reagents that have proven useful in determining the structures of relevant functional states of membrane proteins. these reagents are fab fragments that are generated by phage display from fully synthetic libraries and are called synthetic antibody fragments, or sabs. we have developed phage display sorting strategies that can trap a desired conformational state, making it accessible to structural analysis, or target a particular epitope on the protein surface. however, to maximize this technology for membrane proteins, several limitations of phage display sorting in detergent formats had to be overcome, the greatest being that using detergents can produce non-native conformational biases. we sought to address these limitations by embedding membrane proteins into nanodiscs, soluble lipidfilled discoidal particles, to better mimic the native membrane environment. nanodiscs stabilize the membrane protein and allow it to respond to conformation-inducing stimuli such as ligands, ions and ph during phage display selections. we have established and validated an improved protocol using two membrane protein systems: ) mj , an archaeal membrane protein of unknown function, and ) cora, a pentameric magnesium ion channel. using mj , we compared the nanodisc protocol with the standard method performed in detergent, and as an important byproduct, we characterized the influence of the membrane protein environment on the apparent affinity of sabs to their cognate antigen. using cora, we developed a more sophisticated sorting strategy resulting in a variety of sabs specific to either the open or closed conformation of the channel. finally, using sabs as crystallization chaperones we obtained the structure of mj at . Å resolution, and crystallized cora in several new conditions. lipocalin-type prostaglandin d synthase (l-pgds) is a member of the lipocalin superfamily, and binds a large variety of small hydrophobic molecules. using this function of l-pgds, we have already reported the feasibility of l-pgds as a novel drug delivery vehicle for the poorly water-soluble drugs [ ] . sn- , -ethyl- -hydroxy-camptothecin, is a semi-synthetic analogue of anti-cancer alkaloid camptothecin that targets dna topoisomerase i. despite of the potent anti-tumor activity, however, sn- was not used directly in a clinical practice due to its poor water solubility. thus, irinotecan hydrochloride (cpt- ), which is the water-soluble prodrug of sn- , is used for the cancer treatment. however, cpt- shows approximately . % cytotoxic activity of sn- against the various cancer cell lines in vitro, and its metabolic conversion rate is % of the original volume of cpt- . here, we show the development of the drug delivery system utilizing l-pgds, which enables a direct clinical usage of sn- . first, we investigated the effect of l-pgds on the solubility of sn- . in the presence of mm l-pgds, the concentration of sn- was . mm, which was , -fold as compared with that in pbs. then, we carried out isothermal titration calorimetry measurements to investigate the detailed binding mode of sn- to l-pgds. as a result, it was revealed that l-pgds binds three molecules of sn- , and the dissocia- control over the sensitivity with which artificial biomolecular receptors respond to small changes in the concentration of their target ligand is critical for the proper function of many cellular processes. such control could likewise be highly useful in artificial biotechnologies in which highly responsive behavior is of value, such as biosensors, genetic logic gates, and "smart" materials and delivery devices. in nature, the control of molecular responsiveness is often achieved using "hill-type" cooperativity, a mechanism in which sequential binding events on a multivalent receptor are coupled such that the first enhances the affinity of the next, producing a steep, higher-order dependence on target concentration. here we use an intrinsic-disorder-based mechanism that can be implemented without requiring detailed structural knowledge to rationally introduce this potentially useful property into several normally noncooperative biomolecules. to do so we fabricate a tandem repeat of the receptor that is destabilized (unfolded) via the introduction of a long, unstructured loop. the loop spatially separates the two sets of the two halves of the binding sites, preventing a complete binding site that enables target molecule binding without prior closure of the loop. thus, the first binding event requires the energetically unfavorable closing of this loop, reducing its affinity relative to that of the second binding event, which, in contrast occurs at a pre-formed site. using this approach we have rationally introduced cooperativity into three unrelated aptamers, achieving in the best of these a hill coefficient experimentally indistinguishable from the theoretically expected maximum. the extent of cooperativity, and thus the steepness of the binding transition, are, moreover, well modeled as simple functions of the energetic cost of binding-induced folding, speaking to the quantitative nature of this design strategy. essential and non-essential amino acid species for an ancestral protein satoshi akanuma the translation system is an essential element for life because it links genetic information embedded in genes to functional molecules, proteins. the modern genetic code, which encodes the standard amino acids (and three terminations) using triplet codons, is shared by most of the extant organisms on the earth. a number of theories have been proposed for the origin and evolution of the genetic code, and these theories suggest that only a fewer amino acids were used in primitive proteins and later the amino acid repertoire gradually increased up to through the course of evolution. if so, one would wonder how many number of and which types of amino acids were involved in the primitive proteins. i have begun to address this issue experimentally. i first resurrected several ancestral proteins and then restricted the amino acid usage of one of the resurrected proteins. i targeted nucleoside diphosphate kinase (ndk) that catalyzes the transfer of a phosphate from a nucleoside triphosphate to a nucleoside diphosphate. ndk may have arisen early because at least one gene that encodes ndk is present in most extant organisms. the first step in the reconstruction of ancestral ndk sequences is to prepare multiple amino acid sequence alignments using homologous sequences of ndk from extant species. then, phylogenetic trees were built. ancestral sequences of ndk that represent the last common ancestors of archaea and of bacteria were reconstructed using the information contained in the predictive phylogenetic trees. the reconstructed ancestral kinases are extremely thermally stable [akanuma et al., ] . then, using the most thermally stable ancestral ndk, arc , as the starting molecule, i restricted its amino acid usage. arc does not contain any cysteine residue and therefore consists of amino acid species. i completely replaced one of the amino acid species by other amino acid species and thus created proteins each of which consisted of amino acid species. then, i evaluated the stabilities and activities of the resulting arc variants to assess the individual contributions of the amino acid species. as the result, i found that the amino acid species do not equally contribute to the stability and activity of arc and that some amino acid species can be easily lacked but others are important or essential for its stability and function. the result clearly shows that the full amino acid species are not necessarily essential and supports the hypothesis that proteins in the early stage of evolution were made from a reduced amino acid set. the protein surface recognition for protein-protein interactions (ppi) is involved in signal transduction, immune reaction, and creation of the nanostructures in living cells. the methods for rational designing of ppi that could provide non-antibody scaffolds and nanostructured materials are required for the therapeutic and nanotechnological applications. although there have been some successful rational designs with computational methods, it is still difficult to design freely the ppi onto arbitrary proteins. the reason for this limitation is decreased solubility in the designed protein due to the additional hydrophobic residues in order to drive ppi. another reason is a limited set of design modes by which proteins can interact, because the target proteins have individual surface structures. therefore, many methods of constructing an interface for numerous target scaffold proteins without loss of their solubility are necessary. surface exposed a-helices are often observed in natural globular proteins. moreover, there are many examples for naturally occurring oligomeric proteins where an a-helix from each subunit interacts to form an intermolecule coiled coil. further, the works related to designing of artificial helical bundle reported by the several other groups have provided information about how to generate and tune the interaction between a-helices. therefore, a surface exposed a-helix would be a good target for designing a de novo interface onto the scaffold protein. here we engineered two different proteins, sulerythrin and cys-larfh, to form the cys-larfh-sulerythrin dimer-cys-larfh heterotetramer via an intermolecular helix-helix interaction. wild-type sulerythrin forms a dimeric eight-helix bundle. cys-larfh is a designed monomeric protein that forms four-helix bundle containing interhelical s-s bonds. both sulerythrin and cys-larfh are extremely thermostable. to design protein-protein interfaces onto the individual proteins, we first introduced six leucines to the two a-helices of sulerythrin and three leucines to a a-helix of cys-larfh. as expected, the introduction of the hydrophobic amino acids reduced their solubilities. to recover the solubility, we then introduced six aspartates or glutamates around the hydrophobic surface of the sulerythrin (hereafter referred to as l d or l e). similarly, three arginines were introduced around the artificial hydrophobic surface of the cys-larfh (hereafter referred as iv- l r). the solubilities of the mutants with the hydrophobic interface and additional charged residues were recovered their solubility. in addition, the sulerythrin mutants l d and l e exist mainly as dimer. the cys-larfh mutants iv- l r, also exists as monomer. we then examined the interaction between l e or l d and iv- l r. a pull-down experiment, in which co beads bound to either his-tagged cys-larfh and iv- l r were used to pull down wild-type sulerythrin, l d, or l e, demonstrates that l d or l e specifically interacts to iv- l r. furthermore, when analysed by size exclusion chromatography, the dominant peaks of the mixture of l d and iv- l r appeared at the volume expected for the heterotetrameric complex. thus we successfully created the de novo ppi by using a very simple concept involving hydrophobic interaction in combination with charge interactions. in vitro selection of liposome anchoring peptide by cdna display naoto nemoto , ryoya okawa , yuki yoshikawa , toshiki miyajima , shota kobayashi a liposome-anchoring peptide (la peptide) was selected against liposomes composed of dioleoyl-snglycero- -phosphocholine (dopc) by in vitro selection using cdna display method. the selected peptide la peptide consists of the n-terminal region (hydrophobic) and the c-terminal region (basic) in a characteristic manner. thus, la peptide was synthesized chemically and the interactions between la peptide and particular types of liposomes were investigated and confirmed by confocal laser scanning microscopy. designing of a novel platinum-binding amino acid sequence on a protein surface asumi kaji , hiroya niiro , satoshi akanuma , tetsuya uchida , akihiko yamagishi designing of a novel interaction between a metal and a protein is a key to create hybrid materials between organic and inorganic materials. for example, in a glucose biosensor, which is widely used for measuring glucose concentration in blood, glucose oxidoreductase molecules are immobilized on a platinum electrode by polyacrylamide gel. a metal-binding tags that is added to the n-or cterminus of a protein is also used for fix the protein to a metal. however, a technique to create a metal binding site on a desired position of a protein has not been invent. if such a technique would be established, the technique would contribute to developing and improving biosensors and to producing new bionanoelectronic materials. in this study, we created a platinum-binding site on a loop located at a protein surface. we used an artificial protein, larfh, that had been synthesized by connecting four identical alpha helices originated from the c-terminal segment of the escherichia coli lac repressor with three identical loops. we randomized the ser, gly, gln, gly, gly, ser sequence within one of the inter-helical loops and then selected for binding to platinum by a t phage display system. most of the selected larfh variants contained the tyr, lys, arg, gly, tyr, lys (ykrgyk) sequence in the randomized segment. we then evaluated the affinity of the larfh variant to platinum by means of quartz crystal microbalance analysis. we found that the variant binds to platinum more strongly than does the original larfh. in the annual symposium, we will also report about the affinity of the isolated ykrgyk sequence to platinum and about the crucial role of the first tyrosine in binding to platinum. engineering of an isolated p a subunit of pi ka permits crystallization and provides a platform for structure-based drug design pi ka remains an attractive target for development of anticancer targeted therapy. a number of p a crystal structures in complex with the nsh -ish fragment of p regulatory subunit have been reported, including a few small molecule co-crystal structures, but the utilization of this crystal form is limited by low diffraction resolution and a crystal packing artifact that partially blocks the atp binding site. taking advantage of recent data on the functional characterization of the lipid binding properties of p a, we designed a set of novel constructs allowing production of isolated stable p a subunit missing the adapter binding domain (abd) and lacking or featuring a modified c-terminal lipid binding motif. while this protein is not catalytically competent to phosphorylate its substrate pip , it retains ligand binding properties as indicated by direct binding studies with a pan-pi ka inhibitor. additionally, we determined apo and pf- bound crystal structures of the p a ( - ) subunit at . Å and . Å respectively. comparison of isolated p a ( - ) with the p a/p complex reveals a high degree of structural similarity, which validates suitability of this catalytically inactive p a for iterative sbdd. importantly, this crystal form of p a readily accommodates the binding of non-covalent inhibitor by means of a fully accessible atp site. the strategy presented here can be also applied to structural studies of other members of pi kia family. identification of structural determinants involved in the differential conformational changes of ef-hand modules emma liliana arevalo salina , joel osuna quintero , humberto flores soto , gloria saab rinc on instituto de biotecnolog ıa, universidad nacional aut onoma de m exico identification of structural determinants involved in the differential conformational changes of ef-hand modules calcium signals are regulated by several proteins, most of which belong to the ef-hand superfamily. the ef-hand motif is formed by a helix-loop-helix that binds calcium through its loop . these motifs occur in adjacent pairs, forming a single globular domain which is the basic structural and functional ca binding unit. the proteins in this family can be classified as calcium sensors or modulators, according with their function. the first group undergoes a major conformational change upon calcium binding, while the second one remains practically unchanged , . to explain the biophysics behind the different behavior of these proteins upon ca binding, we have sought to identify structural determinants that could account for these features, especially for the difference in the conformational change. we examined the primary structure from two ef-hand motifs: a sensor ef-hand from chicken troponin c (sciii) and a modulator ef-hand from bovine calbindin d k (clbn). the main differences were in the binding ca loop and a group of charged residues in the h helix of the modulator ef-hand. then, we constructed chimeric clbn motifs containing the loop or the loop and h from sciii motif (h clbnsciii and h h clbnsciii). these constructs were analyzed using a reporter system that discriminates ef-hand-sensor motifs from signal-modulators at the single-motif level. this reporter is based on the fusion of genes codifying for the ef-hand and the prephenate dehydrogenase from e. coli (tyra), a protein which is active only as a dimer. isolated ef-hand motifs have the ability to homo-dimerize and in the fusion can stabilize and activate tyra. the sensor motif exhibits a conformational change by binding calcium and in doing so, destabilizes the dimeric conformation of tyra and virtually eliminates its activity. in the modulators, on the other hand, the rather small conformational change only gives rise to a decreased tyra activity. both constructed chimeric ef-hand fusions showed a loss of activity upon ca binding, indicating that the residues connector of the sensor ef-hand from sciii is sufficient to confer the conformational change. in addition we used cd and extrinsic fluorescence spectroscopies to analyze any conformational change in the h h clbnsciii and h clbnsciii isolated modules, not finding any difference between the ca free and ca bound chimeras, suggesting that the change in activity of the reporter protein is due to a change in the orientation of the helices in the ef-hands induced by calcium. the effect of ca binding of the chimeras in the context of the entire calbindin d k protein is under investigation. mapping side chain interactions at the n-and c-termini of protein helices nicholas e newell, independent researcher interactions involving one or more amino acid side chains near the ends of protein helices stabilize helix termini and shape the geometry of the adjacent loops, contributing to supersecondary structure. side chain structures that have been identified at the helical n-terminus include the asx/st n-caps, the capping box, and hydrophobic and electrostatic interactions. at the cterminus, capping is often achieved with main-chain polar groups, (e.g. the schellman loop), but here also particular side chain motifs clearly favor specific loop geometries. key questions that remain concerning side chain interactions at helix termini include: ) to what extent are helix-terminal motifs that include multiple amino acids likely to represent genuine cooperative interactions between side chains, rather than chance alignments? ) which particular helix-terminal loop geometries are favored by each side chain interaction? ) can an exhaustive statistical scan of a large, recent dataset identify new side chain interactions at helix termini? in this work, three analytical tools are applied to answer the above questions for both n-and c-termini. first, a new perturbative least-squares d clustering algorithm is applied to partition the helix terminal structures in a large ( , example), low-redundancy pdb dataset by loop backbone geometry. the clustering algorithm also generates a set of structural exemplars, one for each cluster, that is used to represent the most important loop geometries at each terminus. next, cascade detection (newell, bioinformatics, ), an algorithm that detects multi-amino acid cooperativities by identifying overrepresented sequence motifs, is applied to each cluster separately to determine which motifs are most important in each loop geometry. finally, the results for each motif are displayed in a capmap, a d conformational heatmap that depicts the distribution of motif abundance and overrepresentation across all loop geometries by projecting these quantities onto the structural exemplars generated by clustering. the capmap reveals the loop conformations most favored by a motif. actual structures from the clusters corresponding to these favored conformations are then examined in a structure browser to characterize the side chain interaction associated with the motif. this work identifies a 'toolkit' of side chain motifs which are good candidates for use in the design of synthetic helix-terminal loops with specific desired geometries, because they are used in nature to support these geometries. highlights of the analysis include determinations of the favored loop geometries for the asx/st motifs, capping boxes, big boxes, and other previously known and unknown hydrophobic, electrostatic, h-bond, and pi-stacking interactions. a goal of future work is to make these results available in a structurally-addressable database that would enable researchers to immediately retrieve the side chain interactions most compatible with a desired loop geometry. generation of fluorescent protein-tagged gp mutants to analyze the intracellular distribution of hiv- envelope protein shuhei nakane , zene matsuda green earth research center, green earth institute co., ltd., res ctr for asian infect dis, inst of med sci, the univ of tokyo, lab of struct virol and immunol, institute of biophysics, cas hiv- is a causative enveloped virus of aids. its envelope protein (env) has two non-covalently associated subunits, gp and gp , which are proteolytically processed from a gp precursor. the gp subunit is a surface protein and gp is a transmembrane protein. the gp and gp subunits are responsible for the receptor recognition and membrane fusion, respectively. the cytoplasmic tail (ct) of gp is about amino acids long and is believed to play a critical role in intracellular trafficking of env. to visualize dynamic trafficking, the c-terminus of gp has been tagged with fluorescent proteins such as gfp. however, tagging of ct may cause a concern to affect the interactions between the ct and cellular proteins that are involved in intracellular trafficking. to avoid this problem, here we tried to insert gfpopt, a gfp variant, into five variable regions of gp . we have analyzed the phenotypes of env mutants, such as the cell surface expression, processing of gp , membrane fusion activity, and virion incorporation. among variable regions of gp , the v region was most sensitive to insertion. v /v region was less sensitive than v . consistent with the recently revealed structure, exteriorly located v and v were highly tolerant to insertion. we used the mutant with the gfp insertion in the v region to analyze the intracellular distribution of env with and without ct. we found that deletion of ct increased the presence of vesicles colocalized with late endosome markers. this is consistent with the hypothesis that the ct region contains a motif regulating intracellular trafficking. our results showed that env with gfpopt insertion in its gp subunit is a useful tool for the study of intracellular dynamics of hiv- env. these mutants would also be useful to trace the fate of virus particles during infection. pi- ngs-guided phage panning: comparison to conventional panning strategy buyung santoso , dorain thompson , john nuss , john dwyer phage display is a powerful tool for generating binders to a target protein. multiple rounds of panning with conventional phage display strategies typically result in a number of hits, which are then individually screened using in vitro assays. clones screened at this stage are a combination of specific binders, sequences that are selected due to amplification bias, and non-specific binders. if the number of specific clones is low relative to the non-specific sequences, a larger number of clones have to be screened to ensure sufficient diversity of early leads. with the advent of next generation sequencing (ngs) technology, we aim to test whether we can increase the diversity of specific hits and decrease the number of non-specific sequences. in our experiment, four rounds of conventional panning produced ten peptide binders to target protein. ngs analysis after two rounds of panning was done in parallel, yielding more than ten thousand sequences, ranked by abundance. all ten binders from conventional panning were found in the top most abundant ngs hits. more importantly, additional hits were found in ngs analysis but not in conventional panning, highlighting this strategy as a promising alternative for hit discovery with the significant upside of more diverse and higher affinity leads. numerous processes in pharmaceutical development, including construct screening, structural genomics, protein engineering and expression optimization among others, require the use of higher throughput plasmid dna purification. the majority of issues encountered in mini, midi, and maxiprep purification kits involve flocculate removal following alkaline lysis, and there is currently no easy way to produce large amounts of plasmid dna without the addition of complicated and time consuming clarification steps. the existence of a hassle-free automated system that is not restricted by sample size would significantly help in cutting time and costs during the initial processing steps of plasmid purification. the autoplasmid mea instrument provides a fully automated solution to traditional problems faced in plasmid purifications, allowing mini, midi, and maxiprep plasmid purifications to be performed on a single instrument. the data presented here on plasmid yield, purity, and suitability for sequencing and transfection/transformation illustrate a new strategy for automated plasmid preps. by eliminating traditional clarification methods, cell culture volumes between - ml can be processed leading to yields ranging from - lg. this flexible system was developed in order to satisfy a wide variety of concentration and yield requirement, while eliminating the time consuming steps previously needed to obtain similar results. the ability to perform fully automated mini, midi, and maxi plasmid preps on one instrument allows for a customized all-in-one purification system that is not restricted by traditional clarification methods, eliminating manual intervention, and streamlining the purification process. the modular nature of protein architectures suggests that proteins have evolved through duplication and fusion to give rise to modular, often symmetric forms, which later diversified under the influence of evolutionary pressure. we have developed a computational protein design method termed reverse engineer evolution (re volution) to create symmetrically self-assembling protein building blocks. we have used this method to design a perfectly symmetric b-propeller protein called pizza. subsequently, we have engineered a metal binding site into this pizza protein. this new pizza variant carries two nearly identical domains per polypeptide chain, and forms a trimer with three-fold symmetry. the designed single metal ion binding site lies on the symmetry axis, bonding the trimer together. two copies of the trimer associate in the presence of cadmium chloride in solution, and high resolution x-ray crystallographic analysis reveals a nano-crystal of cadmium chloride, sandwiched between two trimers of the protein. this nano-crystal, containing seven cadmium ions lying in a plane and twelve interspersed chloride ions, is the smallest reported to date. our results indicate the feasibility of using rationally-designed symmetrical proteins to biomineralize nano-crystals with applications in bionanotechnology. bacillus licheniformis trehalose- -phosphate hydrolase structures suggest keys to substrate specificity chwan-deng hsiao , min-guan lin , long-liu lin , yuh-ju sun institute of molecular biology, academia sinica, department of applied chemistry, national chiayi university, depaertment of life science, national tsing hua university trehalose- -phosphate hydrolase (trea) of the glycoside hydrolase family (gh ) catalyzes the hydrolysis of trehalose- -phosphate (t p) to yield glucose and glucose- -phosphate. products of this reaction can be further metabolized by the energy-generating glycolytic pathway. here we present the crystal structures of bacillus licheniformis trea (bltrea) and its r q mutant complexed with p-nitrophenyl-a-d-glucopyranoside (r q/ ppng) at . Å and . Å resolution, respectively. the overall structure of bltrea is similar to other gh family enzymes. however, detailed structural comparisons revealed that the catalytic groove of bltrea contains a long loop adopting a different conformation from those of gh family members. unlike the homologous regions of bacillus cereus oligo- , -glucosidase (bcogl) and erwinia rhapontici isomaltulose synthase (nx- ), the active site surface potential of bltrea exhibits a largely positive charge, contributed by the four basic residues his , his , lys and lys . mutations at these residues resulted in significant decreases of bltrea enzymatic activity. strikingly, a hhlk motif and the lys residue played critical roles in bltrea substrate discrimination. crystal structure of engineered lrrtm synaptic adhesion molecule and a model for neurexin binding anja paatero , katja rosti , alexander shkumatov , cecilia brunello , kai kysenius , prosanta singha , henri huttunen , tommi kajander institute of biotechnology, university of helsinki, helsinki, finland, dept of pharmaceutical and pharmacological sciences, ku leuven, leuven, belgium, neuroscience center, university of helsinki synaptic adhesion molecules are key components in the development of the brain, and in the formation of neuronal circuits, as they are central in the assembly and maturation of the chemical synapses. several families of neuronal adhesion molecules have been identified such as ncams, neurexins and neuroligins, and in particular recently several leucine rich repeat protein families, e.g. netrin g-ligands, slitrks and lrrtms. the lrrtms form a family of four proteins. they have been implicated in excitatory glutamatergic synapse function, and were specifically characterized as ligands for neurexins in excitatory synapse formation and maintenance. in addition, lrrtm and lrrtm have been found to be ligands for heparan sulphate proteoglycans. we report here the crystal structure of a stability-engineered mouse lrrtm , with a tm c higher than the wild type protein, while retaining its function. we localized the neurexin binding site to the concave surface based on protein engineering, sequence conservation and prior information on the ligand interaction with neurexins, allowing us to propose a tentative model for lrrtm:neurexin interaction compex. cell culture studies and binding experiments show that the engineered protein is functional and capable of forming synapse-like contacts. small angle x-ray scattering data suggests that the wild type protein forms transient dimers, which may have importance for the function. the structural and functional data presented here provide the first structure of an lrrtm protein, and a model for molecular mechanism of lrrtm function in adhesion. computational design of phenylalanine binder olga khersonsky , gil benezer , sarel fleishman recently, abdesign algorithm was developed in our lab for de novo design of antibodies ( ). it is guided by natural conformations and sequences, and exploits the modular nature of antibodies to abstract generate an immense space of conformations, which can be used as scaffolds for design of stable highaffinity binders. we have used abdesign to design a binder of phenylalanine. , antibody scaffolds were obtained by splicing h and l fragments into a template (pdb id brr), and subsequent optimization of vh and vl orientation. phenylalanine binding site, based on native phenylalanine binders, was introduced into the scaffolds with rosettamatch ( ), and the sequences were subsequently optimized by rosetta enzyme design protocol ( ) . designs were experimentally tested by yeast display for binding of biotinylated phenylalanine ligand. several designs were found to bind the ligand, and we plan to further characterize this affinity and improve it using directed evolution techniques. in collaboration with the group of prof. johnsonn, the resulting phenylalanine binder will be incorporated in a bio-luminescent (lucid) sensor for phenylalanine ( ) . phenylalanine monitoring device would be of primary importance for patients with phenylketonuria, a genetic disease with phenylalanine metabolism problem. cold-adapted enzymes are interesting because of their higher catalytic activity compared to mesophilic and thermophilic homologues. alkaline phosphatase (ap) from a psychrophilic vibrio marine bacteria (vap) has an unusual large surface loop that extends from each of its monomers to stabilize a homodimeric structure ( ). in many cold-adapted enzymes, the loop regions are longer compared to proteins of mesophilic organisms and our aim was to study the functional and structural role of this loop. three substitutions (r l, y f and f y) were introduced within the large surface loop as directed by microsecond molecular dynamics (md) simulations. with the r l mutation, two hydrogen bonds were broken that connect the loop to residues on the adjacent subunit, and further two hydrogen bonds broken with the adjacent q . as a consequence, r l displayed a % higher kcat compared with wild-type and a slight decrease in the km value. overall, the catalytic efficient improved by %. the global heat stability (tm) and the active site sensitivity to heat (t %) were reduced by c and c, respectively. md simulations showed that hydrogen bonds to arg are important for longrange communication to the active site. certain rotamers of two important residues in the catalytic site, ser and arg , were favored, presumably toward states more competent for catalysis upon the replacement of arg with leu. in the y f variant, removal of one hydrogen bond between the loop and the other subunit caused a small drop in stability parameters, whereas both kcat and km were reduced by about half, giving similar kinetic efficiency (kcat/km) to that of wild-type. finally, we changed a residue at the root of the large loop (f y) such that one new intersubunit hydrogen bond could form. this variant maintained the wild-type characteristics. in conclusion, removing hydrogen bonds connecting the major loop of one subunit to the protein surface of the other subunit in vap produced higher catalytic activity and this shows functional connections between loop mobility and the active site. our study also demonstrates that interactions between residues in the large disordered loop and the opposite subunit in the dimeric vap are determinants of its stability. thus, we managed to show that loosening of interface contacts between the two vap subunits by replacement of crucial residues provides a way to orchestrate structural and kinetic dynamics in a productive way. the de novo design of artificial proteins arises as a stringent test of our understanding of the relationship between sequence, structure, and function. examples include the design of a four a-helix bundle, a new protein topology called top , and a series of artificial (ba) -barrels called octarellins. however, de novo design has proven difficult for larger proteins with more than amino acids. here we present two methods to generate the backbone and to perform the de novo design of (ba) -barrel proteins through the use of the software rosetta; both have different advantages and limitations. the first method for generating the backbone is knowledge-based, with a first analysis of a non-redundant database of natural (ba) -barrel proteins in order to obtain statistical analysis on preferred secondary structure element length and amino acidic propensities. with this information we use the rosetta cm software to create more than models which are then ranked in term of rosetta energy. the second method is performed with the parametricdesign package of rosetta, in which only geometrical information are requested (number of strands and helices, radius of the b-and a-barrels, degree of inclination, orientation of the side chains, among others). both methods contain a step of loop refinement and multiple steps of sequence design with the package rosetta design, in order to find low scoring amino acid sequences for each of the starting backbone conformations. thousands of models will be generated by both methods and then analyzed in term of sequence similarity, secondary and tertiary structure prediction, and stability by molecular dynamics simulations. the best candidate sequences will be selected for the experimental verification. in order to identify a putative successfully design, we added a metal binding site during the design step. all the proteins will be expressed in e. coli. the solubility of the designed proteins inside bacteria will be determined thanks to the fusion to green fluorescence protein (gfp). solubility, stability, secondary structure, and cooperativity of folding will be assessed for each protein before determination of their three-dimensional structure. construction of protein capsule possessing drugs controlled release ability shota shimizu , masatoshi nakatsuji , keisuke yamaguchi , yuya sano , yuya miyamoto , takashi inui most compounds that exhibit anti-tumor activities are water-insoluble, thus limiting their clinical use. chemical modification of these compounds and the use of solubilizing agents such as organic solvents, surfactants and ph modifiers improve their solubility. however, chemical modification of compounds decreases their potency, and the use of solubilizing agents causes toxicity in many cases. thus, drug delivery systems (dds) for poorly water-soluble anti-tumor drugs which exploit liposomes, cyclodextrins, and lipid nanoparticles have been studied intensely. in these dds, the controlled release of drugs from the delivery vehicle is one of the most important functions. selective release in target cells leads to adequate therapeutic efficacy with few side effects. in our laboratory, we have already demonstrated that lipocalin-type prostaglandin d synthase (l-pgds), an intravital transporter protein, is a novel and valid drug delivery vehicle for sn- , a poorly water-soluble anti-tumor drug. in this study, we generated l-pgds-based protein capsules with a controlled-release function by introducing a disulfide bond into the upper part of the drug-binding cavity of l-pgds. the intracellular concentration of glutathione ( . mm) is known to be substantially higher than the extracellular concentration ( mm). therefore, it is expected that in the extracellular oxidative environment the disulfide bonds in the protein capsule remain stable, avoiding premature release of the internal drugs during circulation of blood, after reaching the target cells, the disulfide bonds are cleaved in the intracellular redox-environment, and then the internal drugs are released. we generated three kinds of protein capsules which have disulfide bonds in different positions, w c/w c, k c/h c, k c/w c, based on tertiary structure information of human l-pgds (pdb id: o y). firstly, we performed circular dichroism (cd) measurements to confirm the structure of each capsule. the cd spectra of three protein capsules were similar to that of wild-type l-pgds in the far-uv region. therefore, the secondary structures of three protein capsules were not changed from wild-type l-pgds by introducing the mutations. quantitative analysis of the free thiol group in the protein capsule by dtnb assay revealed that the intermolecular disulfide bond was formed by h o -induced oxidation and cleaved by dithiothreitol-induced reduction. in addition, to investigate the solubility of sn- in the presence of protein capsules, we mixed the protein capsule of reduced-form with sn- suspension, and stirred at c for hours. the resulting concentrations of sn- in pbs with mm w c/w c, k c/h c, and k c/w c were mm, mm, and mm, respectively. these values were approximately -fold higher than without protein capsules. sds-page analysis showed that the bond formation decreased in a time-dependent manner, and that new intermolecular disulfide bond was not formed in the protein capsules after hours' incubation. from the above, we succeeded in generating drug delivery vehicles possessing openable and closable lids that are responsive in an oxidation-reduction environment. takaaki miyamoto , mai kuribayashi , satoshi nagao , yasuhito shomura , yoshiki higuchi , , shun hirota graduate school of materials science, nara institutte of science and technology, graduate school of science and engineering, ibaraki university, department of life science, graduate school of life science, university of hyogo, domain swapping has been of interest as a mechanism of protein oligomerization, where a secondary structural region or a domain of one protein molecule is replaced with the corresponding region or domain of another protein molecule. we have previously shown that c-type cytochromes and myoglobin form oligomers by domain swapping. , in this study, we show that a four-helix bundle protein cyt cb , in which the heme of cyt b is attached to the protein moiety by insertion of two cys residues, forms a domain-swapped dimer. dimeric cyt cb was more stable than dimeric cyt b at c, showing that attachment of the heme to the protein moiety stabilizes the domain-swapped structure. absorption and cd spectra of dimeric cyt cb were similar to the corresponding spectra of the monomer, showing that the active site and secondary structures were similar between the dimer and monomer. the redox potential of dimeric cyt cb was also similar to that of its monomer. the dissociation temperature of dimeric cyt cb was c, and its dh on dissociation to monomers was . kcal/mol (per dimer). according to x-ray crystallographic analysis, dimeric cyt cb exhibited a domain-swapped structure, where the two helices in the n-terminal region (helices and ) in a protomer and the other two helices in the c-terminal region (helices and ) of the other protomer interacted between each other. the heme coordination structure of the dimer was similar to that of the monomer. we have previously shown that domain-swapped oligomers of horse cyt c form through intermolecular hydrophobic interaction between the n-and c-terminal a-helices at the early stage of folding. it has been suggested that helices and form first at the initial stage of folding in wild-type apo cyt b . therefore, we propose that cyt cb forms a domain-swapped dimer when helices and interact intermolecularly at the initial stage of folding, whereas the intramolecular interaction of helices and results in formation of a monomer. a highly buried and conserved tryptophan residue close to the dimer interface in a cold-adapted phosphatase is phosphorescent and important for activity. jens g. hj€ orleifsson and bjarni asgeirsson. department of biochemistry, science institute, university of iceland, dunhagi , reykjavik, iceland. alkaline phosphatase (ap) from vibrio g - is a cold-adapted dimeric enzyme with one of the highest catalytic efficiency reported for known aps. it contains five intrinsic tryptophan (trp) residues and one additional trp located on the c-terminal streptag used for expression and purification. in this study, we made several single trp-substitutions to determine the role of each of the trp in the fluorescence emission spectrum. we also determined their solvent exposure by acrylamide fluorescence quenching. the results indicate that trp , trp and trp are mostly responsible for the fluorescence emission. quenching experiments with acrylamide indicated that all the trp residues were about equally accessible for quenching, except trp which was shown to be highly buried in the core of the protein. interestingly, the enzyme was found to be highly phosphorescent at c, having two phosphorescence lifetimes. the longer lifetime is due to trp . trp is located close to the dimer interface and points towards a helix in the active site where his binds an active-site zinc ion. in other aps, an aromatic amino acid is conserved in the location occupied by the trp residue. in most cases for cold-adapted aps it is indeed a trp. interestingly, the mutation of the trp to a phenylalanine affected both stability and activity of the enzyme. kcat/km was -fold lower than for wild-type. overall, this study reveals that trp can be used as a phosphorescent probe of local dynamics and could possibly also serve to study the dimer-monomer equilibrium due to proximity to the dimer interface, an area clearly crucial for enzyme activity and stability. modulating protein-protein interaction with a molecular tether helen farrants , oliver hantschel , kai johnsson ecole polytechnique f ed erale de lausanne (epfl) high-affinity scaffolds for protein-protein interactions, such as monobodies and darpins can be engineered in vitro to bind to protein targets. we speculate that the affinity for the target protein can be modulated by incorporating these evolved scaffolds and a synthetic intramolecular tether into protein switches, in a protein construct of composed of snap-tag, a monobody and a circular permutated dihydrofolate reductase. the tether, attached to the construct via snap-tag, was composed of a linker and trimethoprim, which interacts reversibly with the circular permutated dihydrofolate reductase. we have investigated the affinity between the n-sh domain of the phosphatase shp and an evolved monobody in such a protein construct using a fret assay. when the intramolecular tether was bound the circular permutated dihydrofolate reductase ("closed" conformation), there was an increase in the affinity of the construct to the target n-sh . in the presence of a small molecule competitor ("open" conformation) the affinity of the monobody construct to its target was reverted to the value reported in the literature. the intramolecular tether in these protein constructs combined with engineered scaffolds for protein-protein interactions may be a general approach towards protein switches. because most proteins are long polymers of amino acids with twenty or more chemically-distinct sidechains, there are an enormous number of potential protein sequences. here, we report the construction of biologically active proteins with minimal chemical diversity. transmembrane domains of proteins can specifically interact with other transmembrane domains to modulate the folding, oligomerization, and function of transmembrane proteins. for example, the bovine papillomavirus e protein is a -amino acid transmembrane protein that transforms fibroblasts to tumorigenicity by binding directly to the transmembrane domain of the platelet-derived growth factor b receptor (pdgfbr), resulting in ligandindependent receptor activation and cell transformation. these studies showed that a free-standing transmembrane domain could fold properly in cells and act in trans to modulate the activity of a larger transmembrane protein target. because of the relative chemical simplicity of transmembrane domains and this ability to act even when not linked to more complex soluble protein domains, we reasoned that short transmembrane proteins could be used to define the minimal chemical diversity sufficient to construct biologically active proteins. to accomplish this, we infected cultured mouse cells with a retroviral library expressing -amino acid proteins consisting of an initiating methionine followed by a randomized sequence of leucines and isoleucines, two hydrophobic amino acids that differ only by the position of a single methyl group, and selected rare proteins with transforming activity. we isolated numerous proteins consisting of diverse sequences of leucine and isoleucine that cause morphologic transformation, escape from contact inhibition and focus formation, and growth factor independence. genetic and biochemical analysis of these proteins indicate that like e they interact with the transmembrane domain of the pdgfbr to specifically activate the receptor and transform cells. mutational analysis of individual proteins identified specific leucines and isoleucines required for transforming activity, and insertion of a single isoleucine at a particular position in a stretch of leucines is sufficient for activity. these proteins identify the minimal chemical diversity required to generate a biologically active protein and have important implications for biochemistry, protein evolution, protein engineering and synthetic biology. yusuke azuma , donald hilvert virus-like particles that are precisely loaded with functional cargo are an important tool to study the effect of spatial confinement and create novel entities with application in biotechnology and medicine. by genetic fusion to a positively supercharged green fluorescent protein (gfp( )), an enzyme retroaldolase (ra) was efficiently targeted to the negatively charged lumen of an engineered protein cage, aquifex aeolicus lumazine synthase variant (aals- ). the encapsulation is quantitative under mild aqueous condition up to a mixing ratio of guest enzymes per host cages. the chromophoric tag is used for precisely quantifying the enzyme concentration, which allows detailed characterization of the effect of encapsulation on the enzyme activity. the generality of the encapsulation system was examined with structurally different enzymes. introduction and purpose: in the immune system, high affinity antibodies are generated by selection of b cells activated by antigen-stimulation followed by additional optimization through somatic hyper mutation of antibody genes. in the artificial antibody libraries, such as phage libraries, selection of specific antibody clones from the library is performed by in vitro selection process called biopanning and the subsequent binding screening. however, in spite of high efficiency of enrichment in biopanning, there is a possibility that we overlook the minor antigen-specific clones in the screening because of the limitation of the number of clones employed for screening. in recent years, high-throughput analysis of dna sequences by the next-generation sequencer (ngs) has become available not only for genomic analysis of organisms but also repertoire analysis of antibodies. in this presentation, we report the successful isolation of a variety of antigen specific antibodies from patients-derived antibody phage library by a combination method of high throughput sequence analysis on ngs and biopanning. method: we constructed two kinds of human single chain fv (scfv) antibody libraries from pooled mrna of five cancer patients and of a wheat allergy patient, respectively. after biopanning against a cancer antigen or wheat allergy antigen "gluten", the phagemid vector dna prepared from the pooled phages before or after biopanning was used for pcr amplifications of vh genes, adding the index and adapter sequences for ngs analysis. the high throughput sequencing was performed on miseq (illumina) using miseq reagent kits v . after discarding the short sequences and low quality data, '-and '-reading sequences were unified by a merge program. the frequencies (%) of all vh sequences were evaluated using a program based on usearch . clustering software and the changes of the frequency (%) of each sequence between before and after panning were assigned as amplification rate. results and discussion: vh sequences at each round of pooled phages after biopanning against cancer antigen were analyzed on ngs. after three rounds of biopanning, three clusters of antibody sequences were specifically enriched suggesting these are specific binders. to check this, scfv gene were regenerated by pcr using h-cdr specific primers and scfv-displaying phages reconstructed were subjected to binding analysis. all three phages showed a clear specific binding to cancer antigen in elisa. subsequently, to test the usefulness of this method, we applied it to identify allergen-specific scfv from allergy patient-derived antibody phage library. the phylogenetic tree analysis of vh sequences which showed the amplification rate higher than . by a single round of biopanning elucidated total eleven clusters of vh sequences. the vh sequences in the two clusters with the highest amplification factor were selected and the regenerated scfv-displayed phages were tested for binding analysis. the prepared scfvdisplayed phages and also scfv proteins showed a clear binding ability to allergen. thus, it is suggested that the analytical method of vh sequences on ngs before and after biopanning is very useful to isolate a variety of disease related antigen-specific novel antibodies quickly with high degree of certainty. biochemical analysis of the recognition helix of z-dna binding proteins: roles in conformational specificity yang-gyun kim , xu zheng , so-young park conversion of right-handed b-dna into left-handed z-dna is one of the dramatic structural transitions in biological processes including gene regulation and chromatin remodeling. z-dna binding motif, zalpha (za), was first discovered from human adar . subsequently, with sequence and structure similarity to the hzaadar , families of proteins including viral e l, interferon-induced protein dai (zbp ) and pkz has been identified to have za domain(s). interestingly, the za domain of the e l protein from vaccinia virus (vvzae l) was confirmed to have the ability of z-dna-binding, but it does not have the b-to-z conversion activity. here, we showed that the replacement of the a -helix of vvzae l (vvzae l-a ) with that of hzaadar results in acquiring the ability to converting b-dna to z-dna. the detailed biochemical analysis of the a -helix mutants of vvzae l further suggested that the contribution of positively charged residues in the c-terminal part of the a -helix is crucial during the b-to-z transition. in addition, hydrophobic residues of the n-terminal part of the vvzae l-a also influence on the b-to-z conversion activity, possibly through forming a tightly-packed structure. in conclusion, our results revealed the previously-unknown contribution of amino acid residues existed in the a -helix of the za domains to the b-to-z conversion. moreover, it strongly implies that such residues may play important roles in initiating conformational changes of dna structure during the b-to-z conversion event. the ability of switching the activity of proteins at will is of great interest from an application point of view. one promising approach utilizes a protein modification with an organic photochromic molecule. linking two protein side chains with the photochrome that undergoes a light induced conformational change, protein secondary and tertiary structure can be stabilized or destabilized and thus the structure dependent activity can be switched "on" and "off" by light irradiation. for this the photochrome must fulfil several requirements. foremost, it must possess two states of comparable stability that differ significantly in their geometry. it must further be water soluble and non-toxic, and should not experience fatigue phenomena upon multiple irradiations. there are two classes of molecules that fulfil those requirements: azobenzenes and spiropyrans. we are pursuing two different strategies for the design of photoswitchable proteins. in the first approach we attach an azobenzene compounds to side chains of the alpha-helical antifreeze protein type i. the end to end distance of the photochromic molecule is sterically compatible with the folded helix only in one form, photoisomerization therefore switches the folding state between an active helical state and an inactive unfolded form. in a second, more general approach we use the trp-cage domain as a switching unit. the trp-cage is the smallest known folded protein ( amino acids). its folding is induced by hydrophobic interactions of a tryptophan side chain in a short helical segment. after modification with a photochromic molecule in appropriate positions, its structure is rendered sensitive to the state of the chromophore. by creating protein chimera of such a trp-cage and biologically active peptides with helical propensity, we aim at conferring the light-dependent fold of the cage to the attached peptide moiety. salt-bridges are electrostatic interactions between groups of opposite charges. net interaction energy (ddgnet) of a salt-bridge is partitioned into bridge (ddgbrd), desolvation (ddgdsolv) and protein (ddgprot) energy-terms of which estimation of ddgdsolv and ddgprot are only possible by computational means. thus, general purpose poisson-boltzmann equation solver: "delphi" (in commercial package of insight-ii) and "apbs" (open-source) are popularly used to determine these energy-terms. nevertheless, the computation-method is highly involved one than other uses of these solvers. moreover, protein-specific saltbridges, grid-points, center, hydrophobic-isosteres-mediated mutation-files of original charge-radius file and others are to be worked out prior to the computation. this might answer as to why only limited numbers of structure files ( % of crystal-structure-database) are worked out till date. at this juncture, an efficient fully automated all-in-one-procedure that could analyze large dataset in a single run would be useful. to the best of our knowledge, such procedure is truly lacking in public domain. at this end, our fully automated all-in-one procedure: adsetmeas (available freely at http://sourceforge.net/projects/adsetmeas/along with detailed documentation) uses "apbs" method to compute component as well as net energy-terms of salt-bridges and redirect compact output in excelformat. further, micro-environments of salt-bridges are also been reported based on the presence of polar, dipolar, acidic, basic and hydrophobic side-chains in their proximity. the procedure provides versatility to users in choosing a] model for computation of energy-terms to-date available in the literature and b] method (default or advanced) for parametric optimization in "apbs" calculations. it works in unix like environment including cygwin. it processes all proteins present in the working directory with any number of salt-bridges in them. a pre-released version of the procedure was successfully applied for energy-terms on salt-bridges from halophilic proteins. overall, our adsetmeas provides intricate details on salt-bridge energetic from crystal structures and find application in the field of computational structural biology. these and other results will be discussed in the conference. next generation analgesics -targeting ion channels with antibody-drug conjugates (adcs) anna wojciechowska-bason , clare jones , chris lloyd postdoctoral fellow, adpe, medimmune, cambridge, ria, medimmune, cambridge, adpe ion channels are common targets for chronic pain therapies. small molecule analgesics are widely used therapeutically, but due to poor specificity they often cause a wide range of side effects. as a result, efficacy of existing treatments is very limited. we believe that to achieve the required specificity and efficacy, a novel and innovative approach is required that would combine the potency of the small molecule with the selectivity of an antibody. therefore, we propose to apply antibody-drug conjugates (adcs) to deliver small molecules or peptides to ion channels in order to specifically modulate pain signalling pathways. voltage-gated sodium channel nav . has a well characterised role in the perception of pain. here we present the activity of the peptide huwentoxin-iv (hwtx-iv) and small molecule inhibitors ptc-a, ptc-b and ptc-c on voltage gated sodium channels nav . and nav . . in novel findings, we report that these inhibitors show little selectivity between the voltage-gated sodium channel family members, nav . and nav . , and that the ic values and the impact on channel biophysics (voltage-dependence of activation and fast inactivation) of the inhibitors are largely similar for both channel types. therefore, the use of hwtx-iv and other small molecule inhibitors of nav . for pain therapy could be dose-limited due to side effects mediated by the inhibition of channel nav . . in conclusion, we propose that hwtx-iv and the investigated small molecule inhibitors could be used for the treatment of pain as part of a nav . antibody-drug conjugate (nav . -adc), establishing nav . specificity and minimising side effects. maria antonietta carillo , daniel varon silva malaria is one of the most infectious diseases caused by plasmodium species parasites. the merozoite surface protein (msp ) is the most abundant protein on the surface of the plasmodium species merozoite stage, which plays an important role during the erythrocytes invasion process [ ] . msp is synthesized as a -kda glycosylphosphatidylinositol (gpi) anchored protein precursor which is processed at the end of the schizogony into four different fragments. the primary processing step produces a complex of four fragments that are present on the merozoite surface. the secondary processing step at erythrocytes invasion results in the detaching of the complex from the surface, except for the cterminal -kda domain (msp ), which remains anchored to the parasite surface by the gpi moiety. in human malarial infections, the gpi is considered to be a toxin that causes the expression of various host genes and induces a pro-inflammatory immune response, making it a valuable candidate for the development of anti-malarial drugs. in order to study the function of the gpi and evaluate the effects, msp fragment has been expressed, purified and anchored to the synthetic gpi molecule using protein trans-splicing strategy based on the split intein method [ ] . the role of the gpi moiety will be studied through protein folding experiments and the effect of the anchored protein will be evaluated in vitro in order to understand the function of the gpis. assessment of uch-l substrate selectivity using engineered ubiquitin fusions with varying linker lengths peter suon, mario navarro, john love san diego state university, san diego state university, assessment of uch-l substrate selectivity using engineered ubiquitin fusions with varying linker lengths peter suon, mario navarro, and john j. love san diego state university the ubiquitin proteasome system (ups) is a complex system composed of multiple structural and functional elements that play key roles in cellular processes such as signal transduction, cell cycle regulation, apoptosis, and protein degradation. proteins destined for degradation are first tagged with the protein, ubiquitin, which is covalently attached to internal lysine residues. once the target has be degraded by the proteasome; the enzyme ubiquitin carboxy hydrolase l (uch-l ) is believed to prepare ubiquitin for additional rounds of ubiquitination by cleaving small peptides and chemical adducts from the ubiquitin c-terminus. previously in our laboratory, protein substrates of uch-l were engineered and used to characterize uch-l substrate selectivity. the engineered substrates consisted of n-terminal monoubiquitinated test variants derived from streptococcal protein g (protein gb ) and staphylococcal protein a (spab). the thermal denaturation temperatures (tm) of the fusion proteins were measured using circular dichroism and span a range of over c. more importantly, the rate of hydrolysis for the fusion proteins is demonstrated to be directly correlated to the tm of the test variant fused to the c-terminus of ubiquitin. previously, the engineered substrates were designed to emulate natural ubiquitin fusions and thus did not contain any 'linker' residues between the c-terminus of ubiquitin and the n-terminus of the test protein. to explore the effects of linker length on uch-l hydrolysis we are engineering new uch-l substrates that contain an unstructured amino acid linker between ubiquitin and the test protein. to further explore the catalytic efficiency of uch-l we will revisit diubiquitin (ub-ub), which is not hydrolyzed by uch-l , and will make mutations in the hopes of generating a hydrolysable substrate. using rational design, the new variants will be engineered to destabilize the c-terminal ubiquitin to determine if this results in hydrolysis of the new ub-ub construct. the thermal stability of these new fusion protein substrates will be measured using circular dichroism spectroscopy (cd) and uch-l hydrolysis rates will be characterized using existing assays. our goal is to continue the use of engineered substrates to further explore the catalytic properties of uch-l activity and the potential role in protein trafficking and degradation within living cells. we present a biophysical study of a suite of helical proteins that have been modified to contain and -amino acid additions on their termini that impart increased resistance to degradation in e. coli abstract recombinant expression systems. the b domain of staphylococcal protein a (ab) and the homeobox dna-binding domain from d. melanogaster engrailed (en) are small -helix bundles. these domains do not appreciably accumulate in the e. coli bl (de ) cytoplasm when expression in a pet vector is chemically induced. this is likely due to host protein degradation/recycling factors that function to efficiently degrade these two proteins. addition of sequences encoding either of two amino-terminal beta-hairpins to either the n-or c-terminus of ab and en results in the accumulation of large amounts of these new chimeric proteins. additionally, destabilization of the ab or en sequence does not abolish the expression enhancement effect of the beta-hairpin addition. we have investigated the biophysical origins and effects of the beta-hairpin additions using circular dichroism (cd) spectroscopy, and have determined that the added sequence does not significantly perturb the secondary structure of ab or en, nor does it significantly influence the unfolding temperature (tm). while investigation into the origin of the accumulation effect is ongoing, we hypothesize that the addition of the sequence is disruptive to recognition events in the native protein degradation machinery in e. coli. thus, this approach represents both a biotechnological tool for expressing helical peptides recalcitrant to expression, as well as a system well-suited to probing mechanisms of protein recycling and homeostasis. a special class of these proteins are lipidated proteins containing a glycosylphosphatidylinositol (gpi) glycolipid moiety at the c-terminus. the lipid chains of the gpi anchor molecule are responsible for the membrane association of the attached protein. a unique feature of gpi-anchored proteins is that after isolation they can be reinserted into the membrane of recipient cells with the retention of the biological function. accordingly, the exogenous introduction of fluorescent gpi-anchored protein analogues into cell membranes is a useful method for visualizing the cellular traffic of membrane associated proteins and for engineering cell surfaces. we have recently shown that cholesterol can be applied for anchoring proteins to the plasma membrane of live cells without perturbing the membrane. in order to introduce proteins containing covalent modifications that are not genetically encoded, an enzymatic method was considered and fused with the c-terminal cholesterylation method. the usefulness of the method is demonstrated via the preparation of multimeric model proteins of kda monomers, that is an appropriate representation of the ligation of domain size proteins. transmembrane domain dimerization drives p ntr partitioning to lipid rafts irmina garc ıa carpio , marc¸al vilar sociedad de biof ısica de españa. sbe p neurotrophin receptor (p ntr), is best known for its role in mediating neuron cell death during development or after injury but it also regulates cell proliferation, axon guidance or survival. the key to understand its signaling could rely in its structure and conformational states. it has been described that p forms disulfide-linked dimmers through the cys in the transmembrane domain which are essential for its ngf mediated signaling. previous studies have shown that p is present in lipid rafts, where it interacts with intracellular adaptors to activate different signaling pathways. we design several p mutants in the tm domain that impairs dimerization and study the role of tm domain dimerization in lipid rafts recruitment. our analysis suggests that p tm domain dimerization influences lipid raft partitioning. these results could be a key role to understand its signaling and processing pi- bioluminescent sensor proteins for therapeutic drug monitoring of the monoclonal antibody cetuximab martijn van rosmalen , remco arts , brian janssen , natalie hendrikse , dave wanders , maarten merkx therapeutic drug monitoring (tdm) -adapting the drug dosage scheme to the individual patient's pharmacokinetic and pharmacodynamic characteristics -is still uncommon for therapeutic monoclonal antibodies, despite preliminary studies showing its potential benefits. one of the factors impairing tdm implementation is the lack of equipment and trained personnel to regularly measure drug concentrations in patients receiving treatment. point-of-care diagnostic devices which could be used by patients themselves or by their general practitioners would greatly advance the feasibility of tdm. here we present a biosensor for the therapeutic monoclonal antibody cetuximab. we developed a series of cyclic peptides that specifically recognize cetuximab, covering a fourfold range of affinities, and incorporated these cyclic peptide sequences into a set of luminescent sensor proteins. the sensors translate cetuximab concentrations into a change in emission color that can be read out using a mobile phone camera. together, these sensors can quantify cetuximab levels within the relevant therapeutic concentration range and we propose that they can be used for therapeutic drug monitoring applications. genetically encoded biosensor for cell permeability of inhibitors of the p -hdm interaction silvia scarabelli , thomas vorherr , kai johnsson ecole polytechnique f ed erale de lausanne, the evaluation of the permeability across the cellular membrane is a key step in the development of therapeutics, since it affects the distribution and the efficacy of the latters. reliable and versatile techniques for the determination of structural permeability determinants of molecules and information about the entry kinetics are still missing. we introduced in the past a class of semi-synthetic ratiometric sensor proteins (snifits) that has been shown to be suitable for the measurement of intracellular metabolites concentrations. here we describe a totally genetically encoded sensor based on the snifits modular design for the assessment of the cell permeability of small molecules and peptides inhibitors of the protein-protein interaction between p and hdm . we show that our sensor detects the presence of hdm -binding stapled peptides in vitro, and, when expressed in mammalian cells, it responds to the perfusion of the known small molecule hdm inhibitor nutlin- a. moreover, experiments made with an automated microscope show that the sensor is suitable for measuring and comparing the kinetics of entry of different kinds of inhibitors in the cytosol of living cells. in parallel, we are developing an hcaii-based sensor protein for the sensing of sulfonamides and eventually their peptide derivatives. we show that the sensor responds to the presence of different kinds of hca-inhibitors in vitro and in perfusion experiments. this second sensor would broaden the range of molecules and peptides whose permeability can be studied with our tools beyond the family of the hdm -binders. our sensors overcome the limitations of the already existing techniques for measurements of permeability while offering a simultaneous measurement of the cell permeability and of the binding efficiency of small molecules and peptides of interest. archer: predicting protein function using local structural features. a helpful tool for protein redesign. jaume bonet , javier garcia-garcia , joan planas-iglesias , narcis fernandez-fuentes , baldo oliva structural bioinformatics lab, grib, upf, division of metabolic and vascular health, university of warwick, the advance of high-throughput sequencing methodologies has led to an exponential increase of new protein sequences, a large proportion of which remain unannotated. the gap between the number of known proteins and those with assigned function is increasing. in light of this situation, computational methods to predict the function of proteins have become a valid and necessary strategy. here we present archer, a server that exploits archdb's hierarchy of super-secondary structures to map go and enzyme functions upon protein regions and, thus, infer the function of a protein. the server relies on either the sequence or structure of the protein of interest and returns the mapping of functional subclasses extracted from archdb. moreover, it computes the functional enrichment and significance of each subclass, combines the functional descriptors and predicts the function of the query-protein. combining the functional enrichment analysis of the super-secondary structures with the structural classification of archdb, users can select variants of the target sequence that swap the region of a supersecondary structure by another that putatively fits in the same scaffold minimizing the effect on the global tertiary structure. only variants that modify the predicted function are offered for selection, thus providing a rational, knowledge-based, approach for protein design and functionalization. the archer server is accessible at http://sbi.imim.es/archer. phytochromes are natural photoreceptors known to regulate photosynthesis in plants, fungi and bacteria. phytochromes found in bacteria share common architecture and consist of a pas-gaf-phy photosensory core and a c-terminal output module, responsible for biological function. a bacterial phytochrome, bphp , from rhodopseudomonas palustris undergoes reversible conversion from the farred absorbing state (pfr) to the red-absorbing state (pr) followed by the conformational change upon nm light irradiation. as most of bacterial phytochromes, bphp forms a dimer. it was shown that nm light causes a protomer swapping between the bphp dimers; and likely, the output module is involved in this process. however, the mechanism of the light-induced swapping is poorly studied. we tested an ability of the protomer swapping between bphp dimers using pull-down biochemical assay. for this, strep-tagged bphp was immobilized on strep-tactin sepharose beads in the presence of untagged bphp fused to mruby at different concentrations. after incubation, the proteins were eluted and visualized in sds-gel using a zinc-induced fluorescence assay. an amount of the bound to beads protein was estimated by densitometry. it was found that more than % of heterodimers (streptagged-bphp and bphp -mruby ) form within . h of incubation under nm light at -fold excess of one of the interacting partners. in darkness, the swapping was much slower. in the similar setup we checked the amount of heterodimers after , and min of incubation. no difference was observed for different time points, suggesting that the protomer swapping is relatively fast process. next, a role of the c-terminal effector domain of bphp in the light-induced interaction was studied. for this, kinetics of the pfr-to-pr transition was analysed by measuring of absorbance at nm and nm for full-length bphp and a bphp mutant with the deleted c-terminal domain. while full-length bphp showed the normal pfr-to-pr transition, absorbance of the mutated bphp at nm did not raise. however, nm absorbance changes were similar for both proteins; and surprisingly, the similar dark relaxation kinetics was observed. we propose that the impaired pfr-to-pr transition is caused by restricted pr conformation in the mutant rather than by fast pr-to-pfr relaxation. understanding the mechanisms of the bphp light-induced structural changes and the protomer interaction should advance engineering of bacterial phytochromes into fluorescent probes and optogenetic tools. antibody detection is an integral part of many diagnostic strategies, most crucially so when infectious diseases are involved. currently used assays, such as elisa or spr, enable detection of antibodies in the laboratory with high sensitivity, yet a translation of these technologies to an application outside of the laboratory setting is far from trivial. problematically, the burden of disease for many infectious diseases is carried precisely by those countries where access to laboratory facilities is severely limited. we therefore developed a novel, one-step assay that allows the detection of antibodies directly in solution using a luminescent sensor protein. our strategy is based on the use of a bright luciferase, nanoluc, tethered to a green fluorescent protein (mneongreen) via a semi-flexible linker containing two epitope sequences. crucially, two small helper domains were fused to the protein termini. these domains keep nano-luc and mneongreen in close proximity in the absence of antibody, enabling efficient bioluminescence resonance energy transfer (bret). binding of antibody to the epitopes in the sensor proteins linker domain pulls the bret partners apart, effectively changing the color of emission from green to blue. the assay allowed the detection of picomolar amounts of anti hiv -p antibodies directly in solution, both under optimized buffer conditions and in blood plasma. in principle. the modular sensor architecture should allow detection of any antibody with a well-defined epitope of sufficient affinity. to demonstrate this, the hiv-epitopes were substituted for two ha-tag epitopes, yielding a sensor that enabled the detection of picomolar amounts of anti-ha antibodies. the simple optical readout provided by the sensor system allowed us to record the emitted signal with a conventional mobile phone camera. a simple software application that analyzes the image based on rgb values sufficed to interpret the recorded image vis-a-vis the presence of antibody. bearing in mind the eventually envisioned application in a point-of-care diagnostic setting, this combination of sensor recording and interpretation using nothing more than a mobile phone and a software application holds considerable diagnostic potential. beyond point-of-care diagnosis of infectious diseases, a simple assay to detect and quantify antibodies directly in solution could also have a substantial impact in other fields. antibodies are ubiquitous in biotechnology, and this is reflected by the plethora of potential sensor applications, which range from a role in microfluidic circuits or monitoring the biotechnological production of antibodies, including validation of bispecificity, to veterinary applications, diagnosis of autoimmune diseases and monitoring the success of vaccination campaigns. the continually growing protein data bank (pdb) has been a key resource for general principles of protein structure. for example, parsing structural observations in the pdb into simple geometric descriptors has given rise to statistical energy functions. here we present a novel strategy for mining the pdb on the basis of local tertiary structural motifs (term). we define a term to be the structural fragment that captures all local secondary and tertiary structural environments of a given residue, and query the pdb to obtain quantitative information for each terms. first, we show that by breaking a protein structure into its constituent terms, we can describe its sequence-structure relationship via a new metric we call "structure score." using submissions in recent critical assessment of structure prediction (casp) experiments, we find a strong correlation (r . ) between structure score and model accuracy -a performance that exceeds leading atomistic statistical energy functions. next, we show that querying terms affected by point mutations enables the quantitative prediction of mutational free energies. our simple approach performs on par with state-of-the-art methods fold-x and popmusic on mutations, and provides superior predictions in certain cases where other methods tend to fail. in all, our results suggest that the data available in the pdb are now sufficient to enable the quantification of much more sophisticated structural observations, such as those associated with entire terms, which should present opportunities for advances in computational structural biology techniques, including structure prediction and design. exploiting natural sequence diversity for protein crystallization sergio mart ınez-rodr ıguez , valeria risso , jos e m sanchez-ruiz , jos e a. gavira , departamento de qu ımica-f ısica, universidad de granada, laboratorio de estudios cristalogr aficos, iact-csic-ugr granada during the last decade, different rational and high-throughput approaches have been successfully applied in the protein crystallography field to widen thejjso-called "protein crystallization bottleneck" [ , ] . despite the enormous efforts carried out by our community, the statistics presented by structural biology consortiums [ ] suggest that so far only the easy-to-pick fruit has been attained; thus, new approaches are necessary to further expand the crystallization limiting step to relevant targets. on the basis of previous hypothesis suggesting that the difficulties found in protein crystallization might be a result of evolutionary negative design [ ] , we have used two different protein engineering approaches exploiting natural sequence diversity using beta-lactamase as toolbox: i) ancestral reconstruction and ii) consensus approach [ ] . both approaches resulted in hyperstable and promiscuous ancestral derivatives. furthermore, our initial crystallization results also suggest that both approaches increased the crystallizability of the resulting enzymes when compared to the extant tem- beta-lactamase. the adipocyte-derived hormone adiponectin has become a key player for the understanding of overweight related diseases like obesity, diabetes, atherosclerosis or the metabolic syndrome. one of its abstract major functions are the insulin sensitizing effects, which are mediated by the activation of ampk, p -mapk and ppara ( ). furthermore adiponectin is involved into glucose regulation and fatty acid oxidation. recently, three adiponectin receptors adipor , adipor and t-cadherin have been described while an unknown fourth receptor is hypothesized ( ) . for only two of them (adipor and adipor ) the signaling transduction via adiponectin has been confirmed ( ). in order to find new binding partners or co-receptors, we cloned and expressed full length adiponectin as a fusion protein with a c-terminal intein and a chitin binding domain (cbd) as well as an n-terminal his -tag. by using the impactsystem, the fusion protein was cleaved to form the corresponding thioester. to separate the starting materials as well as the cleaved intein chitin binding domain, the purification was performed with chitin beads. furthermore, the product was concentrated by ni-nta-affinity chromatography. accordingly, the obtained adiponectin thioester was reacted with a tamra-or a biotin labeled peptide, respectively, to receive the corresponding ligation product. finally the functionalized adiponectin was purified by size exclusion chromatography. further studies will allow screening for interacting molecules in cell and tissue derived samples. departamento de quimica fisica, facultad de ciencias university of granada, dpto. de quimica fisica biologica. instituto de quimica fisica rocasolano, departamento de quimica organica, facultad de ciencias university of granada, rational design of non-natural enzyme activities has proved challenging. here, we report the introduction of catalysis of the kemp elimination (a model of proton abstraction from carbon) in scaffolds corresponding to precambrian nodes in the evolution of the antibiotic resistance protein b-lactamase. we used a single-mutation, minimalist approach based on chemical intuition, and obtained catalysis levels similar to those reported in the literature for computational kemp-eliminase designs involving multiple mutations. remarkably, the approach was unsuccessful when performed on modern b-lactamases. we provide experimental evidence that enhanced conformational flexibility contributes to the success of the minimalist design in the ancestral scaffolds. this work has implications for the understanding of function emergence in protein evolution and demonstrates the potential of ancestral protein resurrection in enzyme engineering and design. exploring the importance of dimerization for dj- function through engineered domain fusions sierra hansen , jiusheng lin , mark wilson parkinson's disease is a progressive neurodegenerative disease that affects approximately . million people worldwide and is characterized by the loss of dopaminergic neurons in the substantia nigra pars compacta. dj- (park ) is one of several genes that are mutated in rare forms of familial parkinsonism. dj- is a dimeric cytoprotective protein that defends against oxidative stress and preserves mitochondrial function. dimerization of dj- is thought to be essential for this function, as some diseaseassociated mutations cause poor folding and disrupt the dj- dimer. however, recent reports suggest that dj- may be functional as a monomer. to test this, we have engineered a non-dissociable dj- dimer that is a fusion of two human dj- domains. this construct cannot dissociate into monomers and thus will provide a stringent test of the importance of monomeric dj- . our engineered construct is modeled on plant dj- homologs, which feature naturally occurring duplicate dj- domains separated by a small ( amino acid) linker region. using x-ray crystallography, we confirmed that this engineered non-dissociable human dj- dimer has identical structure to the naturally occurring dimeric protein. we have investigated the influence of enforced dimerization of the pathogenic effects of the parkinsonian l p and l p mutations. cd spectroscopic analysis reveals that single and double l p mutations in the non-dissociable dj- dimer maintain a higher degree of structure than l p mutations in the native protein. additional characterization of the protective capacity and subcellular trafficking of this non-dissociable dj- dimer is underway. the purification, crystallization and preliminary characterization of sdre from s. aureus the purification, crystallization and preliminary characterization of sdre from s. aureus staphylococcus aureus (s.aureus) is an important human opportunistic pathogen which colonizes about % of the human population persistently [ ] . surface proteins of s.aureus can excretion a kind of sortase, which represents a surface organelle responsible during the pathogenesis of bacterial infection the host circulation [ ] . sdr proteins were a component of cell wall anchored family proteins, including sdrc, sdrd and sdre [ ] . sdre could combine with the complement regulatory protein factor h to escape the alternative pathway of complement [ ] . to further investigate the functions of sdre, we have expressed and purified the adhesive domain (residues -' : ), and crystallized the recombinant protein. in addition, we also constructed the mutant s.aureus, and the cell experiments confirmed that sdre gene participate in the bacteria invasion. bacterial microcompartments (bmcs) are proteinaceous organelles that sequester key metabolic reactions to increase enzymatic efficiency or to prevent the loss of volatile or toxic intermediates. there is an increasing desire to engineer bmcs for non-native enzymatic processes. it is thought this will increase multi-enzyme pathway efficiency and allow the expression pathways that may produce toxic or volatile intermediates in bacteria. the mechanisms of small molecule transport and retention of toxic intermediates by bmcs remain poorly understood. better understanding of the bmcs pores critical to engineer bmcs for these non-native pathways. in order to better understand the bmc pore we have undertaken structure-guided modifications of the the hexameric pdua shell protein of the , -propanediol utilization microcompartment (pdu mcp). these modifications include pore mutations in an attempt to alter substrate specificity and permutations of pdua to allow more drastic alterations to the structure of the protein. crystal structures of pdua pore mutants, solved to atomic resolution ( - . Å) provide evidence of the pore residues that confer specificity. further, a pdua permutation (pduap) has resulted in a closed icosahedral cage. this novel pduap cage shows a ph and salt dependent assembly and may serve as a reaction vessel or be utilized for cargo delivery. ( , ) . anm-mc is used to identify targeted transition pathways and intermediates between open and closed states of proteins. at each step of this iterative technique, the protein is deformed along the collective anm mode showing the best overlap with the target direction and its energy is minimized via short mc run. in this work, optimization of simulation parameters (number of mc moves and their perturbation strength, anm deformation factor in each cycle and force constant for backbone bonds) was performed in order to increase the efficiency of this technique. as a result, this technique can now be applied to much larger systems and conformational changes. the transition pathway between apo and dna-bound conformations of the yeast rna polymerase, which is a hetero- -mer with more than residues, will be presented here. moreover, the pathway intermediates for more than diverse proteins were analyzed in terms of changes in local strain energy and backbone torsional angles during apo-to-complex transitions. certain residues interacting with the ligand are detected to exhibit large changes with respect to any of these two parameters for more than half of the proteins in our dataset. department of chemistry and chemical biology, harvard university, howard hughes medical institute, harvard university, transgenic crops have radically reshaped the agricultural landscape. since their introduction in the late s, transgenic crops have affected economic gains greater than us$ billion globally due to reduced production costs and increased yield gains. crops modified to produce biological insecticides derived from the soil bacterium bacillus thuringiensis (bt) are among the most robust methods of pest control. bt toxins offer many advantages over traditional insecticides, chiefly their inability to affect human biology and exquisite selectivity for defined pest species. however, the evolution of resistance to bacillus thuringiensis oendotoxins (bt toxins) in insects has been widely observed in the field, and greatly threatens the use of this mechanism of pest control in the future. we developed a phage-assisted continuous evolution (pace) platform for the rapid generation of high-affinity protein-protein interactions and validated the system by evolving known high affinity antibody mimetics in < days of pace. we applied this system to the evolution of the bt toxin protein cry ac to recognize a non-cognate cadherin-like receptor from trichoplusia ni, a pest for which bt toxin resistance has been observed in both the laboratory and the field. the resulting evolved cry ac variants exhibits high affinity for the target receptor, and kill insect cells more potently than wild-type cry ac. our findings establish that the directed evolution of novel receptor recognition in bt toxins can be used to target resistant pests, and has far-reaching implications for biological reagents and therapeutics. optimization of a designed protein-protein interface brian maniaci , collin lipper , john j. love san diego state university, university of california protein-protein interactions play key roles in practically every biological process. protein-protein interactions vary with composition, affinity, and lifetime of the complex. studying designed protein-protein interactions will provide insight into the underlying principles of complex assembly and formation. computational protein docking and amino acid sequence design were used previously to generate protein dimers from monomeric proteins. the normally monomeric b domain of streptococcal protein-g (gb ) was computational docked to itself, followed by optimization of the interfacial side chains. two variants, monomera and monomerb, were computationally derived as a result of a designed protein-protein interface. these designed proteins were characterized using analytical ultracentrifugation and heteronuclear nmr techniques. this design resulted in a pair of protein monomers that formed a heterodimer of modest binding affinity. a tetrahedral metal-templated interface design strategy was implemented in an attempt to strengthen the monomera-monomerb complex by introducing cross-monomer metal coordination. another advantage of using the metal-templated interface is the ability to control the protein-protein interaction both temporarily and spatially. a number of newly engineered variants of monomer a and monomer b with metal coordination sites were designed, produced, and tested for increased affinity of the protein-protein complex. while the generation of a metal-templated monomera-monomerb complex was unsuccessful, we were able to obtain monomera variants that form a homodimer assembly only in the presence of zinc (ii) ions. the crystal structures of metal-templated monomera variants in the presence of zinc provide an explanation for the observed dimer formation. the crystal structure indicates that the protein-protein interaction is not driven by the designed protein interface, but rather non-specific association via edge-strand interactions. new variants were designed with the goal of engineering a high affinity homodimer in a helix-to-helix orientation as the originally designed protein-protein interface. current evaluation of monomera variants for self-association via metal coordination are being evaluated using size exclusion chromatography with a multi-angle light scattering detector for oligomerization state quantification. the results of this protein design project should lead to a greater understanding of the biophysical parameters that drive natural protein-protein interactions. continuous evolution of site-specific recombinases with highly reprogrammed dna specificities the ability to precisely modify the genome of human cells has enormous potential as a novel therapy and a powerful research tool. in contrast to reprogrammable nucleases, such as talens or a cas / sgrna pair -which specifically cleave dna but then rely on stochastic host cells processes to effect gene insertion -site specific recombinases directly catalyze genomic integration with high efficiency. a major limitation of this approach is that recombinases, such as cre, natively bind with high specificity to long dna target sequences (loxp in the case of cre) that do not exist in the human genome. previous attempts at evolving cre resulted in modest changes to its specificity, or required hundreds of rounds of manual protein evolution. we developed and validated a phage assisted continuous evoluiton (pace) selection for rapidly altering the dna specificity of cre recombinase towards a site present in a human genomic safe harbor locus. the pace experiments resulted in cre variants capable of recombining a substrate with nearly % of the nucleotides altered compared to loxp. we successfully used one of these variants to integrate exogenous dna into the genome of unmodified human cells. we are currently using sequencing methods to determine the specificity of the new recombinase clones. aleardo morelli , burckhard seelig generation of comprehensive deletion libraries mediated by in vitro transposition analysis of protein enzymes and ribozymes from nature, and from in vitro evolution, revealed that deletions of up to dozens of amino acids (or nucleotides) can be structurally tolerated. furthermore, shortened variants can exhibit better stability and increased catalytic activity. in order to investigate the effects of deletions, we developed a new procedure based on in vitro transposition to build libraries of more than , deletion mutants in three to four days. we tested our procedure on dna sequences coding for an artificial rna ligase called ligase c. we used the generated library for an mrna display selection, and isolated two active mutants containing and amino acids n-terminal deletions. structural characterization of ppsc, a multi-domain polyketide synthase from mycobacterium tuberculosis using a fragment-based approach alexandre faille , nawel slama , anna grabowska , david ricard , annaik qu emard , lionel mourey , jean-denis pedelacq polyketide synthases are of great interest in numerous scientific fields. they are composed by multiple domains, each having a different role to play in the catalysis of sequential reactions including condensation, reduction and esterification. their reaction products, named polyketides, represent a large variety of chemical compounds, from antibiotics to immunosuppressors or even anticancer drugs. ppsc is a kda polyketide synthase, organised into six catalytic domains (ks-at-dh-er-kr-acp) with singular functions. along with other type i polyketide synthases, ppsc is responsible for the biosynthesis of an essential polyketide for the virulence of mycobacterium tuberculosis (mtb) and thus is a target of choice for the design of inhibitors. to date, no structural information of any type i polyketide synthase in its entire form has been described. main reasons are the length of these large size enzymes and the flexibility imposed by the linkers between domains, thus making them very difficult to crystallize. numerous questions about domain-domain interactions, spatial arrangement of this complex machinery, substrate specificity and stereochemistry are still unanswered. addressing the structural and functional characterization of ppsc would then help answering these questions and provide valuable information for drug design. to overcome the length-and flexible-dependent problem originating from the presence of multiple domains and linkers, we decided to study domains expressed alone. for this purpose, we used our domain trapping strategy to identify soluble fragments representing a single domain from ppsc [ ] . it has the advantage of not relying on the bioinformatically designed domain boundaries and can even sometimes include parts of linkers to obtain more soluble fragments. using this strategy, we were able to identify relatively small and highly soluble fragments representing each domain of ppsc, thus facilitating the downstream structural and functional characterization. more than fragments have been submitted to crystallization trials. among these, gave crystals and allowed us to determine the x-ray structure of ppsc at, er, in addition to the dh domain in complex with a substrate analog for which activity was confirmed in vitro. the computational design of proteins that bind small molecules remains a difficult challenge in protein engineering. the ability to computationally design native-like interactions with high accuracy and efficiency would be an asset towards therapeutic development, enzyme design, and engineering functional proteins. we have developed a systematic approach to designing interfaces. we first identify ligands with naive binding affinity to our protein scaffold, then use rosettaligand to computationally dock the ligand while designing the interface for a tighter interaction. this way, we are taking a 'shot in dim light' for design as opposed to a 'shot in the dark', allowing us to more thoroughly investigate the successful and not-so-successful designs, and improve the computational methods. of ligands screened, we identified weakly-binding hits in the range of - mm. thus far, rosettaligand has successfully designed one tighter protein-ligand interface, from mm to mm. in progress experiments include designing and experimentally validating more designed interfaces. structural studies of human acidic fibroblast-growth factor (fgf ) mutants with a probable anticancer activity lectins are carbohydrate-binding proteins ubiquitously present in nature. they play a role in biological recognition phenomena involving cells and proteins. the interaction lectin-carbohydrate is highly specific, and can be exploited for the development of nanoparticles containing on their surface lectins specifically directed to carbohydrate residues present only on malignant cells and absent on healthy ones ( ) . lectins have been found to possess anticancer properties and they are proposed as therapeutic agents, binding to cancer cell membranes or their receptors, causing cytotoxicity, apoptosis and inhibition of tumor growth. some lectins are able to prevent the proliferation of malignant tumor cells because they recognize the t-antigen (gal b - galnac) found specifically on the surface of tumor cells ( ) . the main problem is that their use as a detection agent for the t-antigen in clinical studies is not possible because the immune system can recognize them as foreign molecules and develop an immune response. previous studies with x-ray crystallography made in our laboratory have characterized a lectin found in mushrooms called bel b-trefoil which has antiproliferative activity on tumor cell lines, because it contains three binding sites for the t-antigen. unlike other lectins with this property, bel b-trefoil shows structural homology with a human protein, acidic fibroblast growth factor (fgf ) ( ). superposition of their structures suggests that the human protein could be mutated to contain at least one of the binding sites for the t-antigen. such mutations should create in fgf the potential capacity of recognizing tumor cells with less immunogenicity than the fungal protein. fgf is mitogenic and chemotactic, and mediates cellular functions by binding to transmembrane receptors, which are activated by ligand-induced dimerization requiring heparin as co-receptor. to reach our purpose, the fgf cdna was cloned into a bacterial plasmid and then mutated in five different positions to eliminate its mitogenic activity and to engineer in the protein the t-antigen binding capacity. attempts to crystalize the mutants of fgf were made using the hanging drop technique with the final aim to carry out their structural characterization by x-ray diffraction analysis of the crystals. the de novo synthesis of proteins in response to the activation of cellular signaling pathways is a crucial element of many high-level biological processes, including the synaptic plasticity underpinning memory formation in the brain. while of fundamental biological importance, there has been a shortage of tools with which to specifically target pools of newly synthesized proteins of interest for study. thus, we have developed timestamp and smash, methods for drug-dependent tagging, or destruction, respectively, of newly synthesized copies of proteins of interest. both methods rely on protein tags that remove themselves by default via an internal hepatitis c virus (hcv) ns protease, but which are retained in the presence of cell-permeable small molecule protease inhibitors. the timestamp tag contains split yfp halves and epitope tags which are reconstituted and preserved, respectively, on proteins of interest following drug application, whereas the smash tag contains a strong degron which remains attached to proteins of interest following drug application, resulting in their clearance. one limitation of time-stamp and smash is that they can only be used to independently manipulate one protein of interest at a time. furthermore, the application of timestamp and smash to study endogenous protein pools in mammals has not yet been explored. here, we report on efforts to extend these techniques by reengineering ns proteases which can be inhibited by two different drugs orthogonally to one another. by incorporating different drug resistance mutations into two ns protease variants, we engineered ns protease domains that are inhibitable either by asunaprevir only, or by telaprevir only. we found that these tags permit simultaneous and independent control over the newly synthesized pools of two proteins of interest within the same population of cells. we also report the development of transgenic knock-in mouse strains incorporating timestamp and smash tags, which allow the interrogation of newly synthesized pools of specific endogenous synaptic proteins in the context of their endogenous regulatory elements, and without relying on overexpression. infectious diseases are often diagnosed by the presence of specific antibodies that are produced in response to the invading pathogen. one example are antibodies that are present in patient blood after infection with the dengue virus serotype and that are directed against an epitope on the virus' nonstructural protein (ns- ). traditional antibody diagnosis relies on time-consuming multi-step assays that require sophisticated equipment in a laboratory environment. a promising alternative are protein switches that are based on bioluminescence resonance energy transfer (bret). these switches comprise a luciferase (nanoluc) and a green fluorescent protein (mneongreen), which are connected via a semiflexible linker. the linker contains two epitope sequences of ns- to which the antibodies bind specifically. if no antibodies are present nanoluc and mneongreen are held in close proximity via two helper domains and bret can occur; thus green light originating from mneongreen is visible. if antibodies are present, they bind to the specific epitopes in the linker of the switch and cause stretching of the linker and therewith break the interaction of the helper domains. as a result, nanoluc and mneongreen are separated in such a way that bret cannot occur anymore; thus only blue light originating from nano-luc remains visible. using this principle, monoclonal anti-ns- antibodies were detectable in a controlled buffer system and in spiked plasma samples. furthermore, the developed antibody switch was applied to plasma samples of macaques after a primary infection with dengue virus serotype . signal readout was possible using a laboratory-based plate reader as well as the camera of a standard smartphone. we demonstrate that this bret-based protein switch can quickly detect antibodies in solution in a single-step assay format using simple equipment for signal readout, such as a standard smartphone. this simplified antibody detection platform has the potential to be carried out outside of a laboratory, thus in areas with limited laboratory infrastructure and a high number of diverse infectious diseases. proteins expressed from more than two-thirds of the human genome reside within intracellular compartments. of these proteins many are important disease-related targets such as kras and c-myc which cannot be easily addressed by conventional small molecule approaches. some of the weaknesses of small molecules can be addressed by biologic drugs, for example high target specificity and inhibition of protein-protein interactions. the challenge for biologics is how to engineer recombinant proteins to access the intracellular space. one strategy is to use systems evolved by bacteria and viruses to deliver material inside the cells. an example of such pathway is used by pseudomonas exotoxin a (pe). the modularity of pe allows the catalytic domain to be replaced with a biologic payload against desired intracellular target. an additional benefit of pe-based delivery is a possibility of targeting the drugs only to relevant cells in the body by modifying the cell-targeting domain of the pe. the aim of this project is to deliver functional payloads against k-ras and c-myc into the cell using a pseudomonas exotoxin a translocation domain. we used phage and ribosome display to select antibody mimetics that bind k-ras and c-myc. here, we present their activity in biochemical assays and the initial results on generation of pe-based constructs. ( ) . hemagglutinin is synthesized as ha molecule assembled as noncovalently bound homotrimers on the viral surface. this precursor protein is cleaved by trypsin-like proteases to yield two subunits ha and ha linked by a single disulphide bond ( ) . ha is also post-translationally modified by n-glycosylation ( ). it is well established that the virus hemagglutinin is the main antigen, inducing the neutralizing antibodies. in the attempt towards developing influenza vaccine production (the egg-based manufacturing lasts several months) that would be faster and safer the utilization of recombinant antigen alone is currently being observed. recently we demonstrated that yeast produced influenza h protein although cleaved into two subunits induced strong immunological response in mice ( ) . in this report, we describe the biochemical and immunological characterization of the h antigen, based on hydrolytic domain of the h n gene, with deletion of multibasic cleavage site and expressed in yeast system. the ha encoding gene from h n virus with deletion of nucleotides was cloned into ppiczac vector. rha fusion protein with his -tag was secreted into the culture medium and was purified to homogeneity in one step using ni-nta agarose. the efficiency of the antigen purification was mg/l. glycosylation sites of rha were determined using lc-ms-ms/ms. analysis of the n-linked glycans revealed that the rha is glycosylated at the same sites as the native ha in the vaccine strain. next we investigated if the hemagglutinin with deletion of the cleavage site oligomerize into higher molecular forms. to determine the oligomeric forms of the recombinant antigen various approaches were applied e.g. native-page, size exclusion chromatography or dynamic light scattering. as a final experiment to measure the size of oligomers in a protein sample a combined technology sec-mals was conducted, using multi angle light scattering (mals) as a detector. the immunological activity of rha was tested in chicken and mice, where antigen elicited high immune response. the data presented here demonstrate that new influenza antigen produced in p. pastoris is highly immunogenic and might be consider as a candidate for subunit vaccine. structural motifs capture redundant patterns that frequently occur in proteins. motifs associated with contiguous fragments of structure (i.e., secondary structural motifs) are well studied and have been successfully used to capture "rules" describing sequence-structure relationships in protein design and structure prediction. we have extended this concept to motifs that capture tertiary information-(i.e., tertiary structural motifs or terms. we have discovered that a relatively small alphabet of terms describes the known structural universe (all secondary, tertiary and quaternary information in the pdb) at sub-angstrom resolution. this alphabet of universal motifs reveals the remarkable degeneracy of the protein structure space, with just a few hundred terms sufficient to accurately capture half of the known structural universe. we have begun to demonstrate the considerable promise this structural alphabet has for applications such as protein design, structure prediction, and docking. we have developed a novel protein design framework that selects amino acid sequences, given a desired structure, using solely information from the universal terms. we show that given a native backbone, this framework recovers the native sequences to a level on par with state-of-the-art atomistic protein design methods, indicating that the motifs capture the salient structural rules governing native proteins. further, predicted sequence distributions agree closely with observed evolutionary variation. given the apparently high degeneracy among even complex features of protein structure, methods based on mining the pdb for tertiary information should provide ample opportunities for advancement in problems of computational structural biology. sortase-mediated synthesis of protein-dna conjugates for sensitive biosensing bedabrata saha , marieke op de beeck , remco arts , maarten merkx in recent years, semisynthetic protein-dna conjugates have emerged as attractive biomacromolecules for different applications in bio-nanotechnology, biosensing, diagnostics and therapeutics. in protein-dna conjugates, synthetic oligonucleotides allow the construction of desired molecular architecture with high specificity, while maintaining the original functionality of the protein molecules for desired application. however, the synthesis of site-specific and stoichiometric protein-dna conjugates can be challenging. due to the diversity in composition and physico-chemical properties of the proteins, few generic strategies are available for conjugation of protein molecules to a dna scaffold. a common approach is to use thiol-based covalent conjugation, but the introduction of additional cysteines can lead to the formation of intermolecular disulfides or interfere with the formation of native disulfide bonds. as an alternative, here we have developed a site-directed protein-dna conjugation strategy based on sortase mediated trans-peptidation reaction. the sortase recognizes a 'sorting motif' (i.e. lpxtg, x any amino acid), which is recombinantly introduced by site-directed mutagenesis at the cterminal end of the protein molecule. the sortase cleaves the t-g peptide bond and catalyzed the formation of a new amide bond between the lpxt peptide and the n-terminal amine of any molecule bearing an n-terminal oligoglycine motif. for this purpose, a triglycine motif was introduced at the 'end of single-stranded dna (ssdna). on-column synthesis of triglycine modified ssdna, protected on a controlled pore glass beads, simplified the purification process and enhanced the yield of triglycinemodified ssdna (> %). we used this conjugation strategy in several biosensing applications. for example, we used the method to conjugate ssdna linkers at the c-termini of a range of single-chain antibody fragments (scfv) and applied these constructs to allow oriented display of capture molecules on biosensor surfaces. ssdna-scfv were using an excess of triglycine modified ssdna, we achieved % conversion scfv-ssdna conjugate, which can be further purified by in two step purification process consisting of ni-nta affinity column and ion-exchange chromatography. we also extended this sortase-based conjugation strategy to develop a bioluminescence based assay for sensitive target oligonucleotide detection. in this regard, the ' and ' end triglycine-modified ssdna molecules were successfully conjugated with a bret protein pairs, nanoluc luciferase and mneongreen fluorescent protein. the introduction of a c-terminal sortase-his tag and and n-terminal strep-tag allowed efficient purification of theseprotein-ssdna conjugates from excess oligonucleotides and unreacted protein. mass spectrometry based proteomics to identify the protein differences in human breast milk from breast cancer patients and controls devika channaveerappa , roshanak aslebagh , kathleen f. arcaro , costel c. darie breast cancer is the second leading cause of cancer death in women. about % women in the us develop breast cancer. death rates due to breast cancer have been declined over the years due to advancements in mammography and treatment. although, mammography helps in the early detection of breast cancer, it has few limitations. dense breast tissue makes mammogram less accurate. breast milk can be assessed to evaluate the risk of one getting breast cancer by comparing the proteomes of breast milk from healthy and breast cancer suffering individual. this study makes use of mass spectrometry based proteomics to identify the differences between the control and cancerous samples which would further help in identifying potential biomarkers for breast cancer. firstly, sds-page was used to separate the proteins from the whole milk sample. the gel bands for each sample was then excised and cut into small pieces. the gel pieces were washed and trypsin digested in order to extract the peptides. peptide mixtures in the solution were cleaned using c zip-tipp and then analyzed by liquid chromatography-tandem mass spectrometry (lc-ms/ms). minutes and minutes gradient were used for lc-ms/ms analysis. raw data obtained were converted to pkl files using proteinlynx global served (plgs version . ). raw data were then submitted to mascot database search for protein identification. the mascot results were then exported as .dat files and further analyzed using scaffold version . software. three breast cancer milk samples were investigated against healthy control milk samples. in the sds-page gel, after coomassie staining, the protein patterns did show minor differences. after lc-ms/ms analysis, the proteins identified by mascot database search were imported into the scaffold software and compared for the relative ratio between the proteins from the milk sampled from control donors and the donors with breast cancer. there were significant differences identified in the proteomes of the two sets of samples. some of the proteins were upregulated in the breast cancer samples and some were down regulated when compared with the controls. additional investigation of more breast milk samples is ongoing. this study focuses on identifying biomarkers directly in the milk of donors with breast cancer. leukolike vectors: leukocyte-inspired nanoparticles claudia corbo , , alessandro parodi , , roberto palomba , , roberto molinaro , michael evangelopoulos , francesco salvatore , , ennio tasciotti the houston methodist research institute, fondazione irccs sdn, nanomedicine aims to improve drug efficiency by enhancing targeting and biocompatibility, and reducing side effects. multiple surface modifications have been proposed to provide nanocarriers with these features, based on complex synthesis processes and very often inefficient in contemporary providing biological tolerance and targeting properties [ ] . bio-inspired approaches based on surface coatings developed from the purified cell membrane of immune cells represents a new paradigm shift for the development of carrier enable of prolong circulation and proper tumoritropic capabilities. we showed that nanoporous silicon (nps) particles coated with leukocyte cellular membranes -leukolike vectors (llvs) -possess cell-like properties [ ] . llvs can escape macrophage uptake, delay sequestration by the reticulo-endothelial system, target tumor inflamed vasculature and accumulate within the cancer parenchyma [ ] . llvs were fully characterized for their shape, size, surface charge and coating through dynamic light scattering and scanning electron microscopy. in addition we characterized the content and function of the leukocyte's proteins transferred onto the llvs coating through high-throughput proteomic analysis and the results revealed the presence and the correct orientation of several important markers of leukocytes: cd , cd and mhc-i were identified as key players in determining llvs biocompatibility, while leukocyte associated function- (lfa- ) and mac- contributed to the llvs targeting ability and bioactivity towards inflamed endothelium [ ] . recent investigation showed that the coating induced the formation of a singular protein corona (i.e. the protein adsorption layer) on the surface of the nanoparticles compared to negative control following in vivo injection. in addition, the proteolipid coating favored active extravasation of the llvs in the tumor vasculature by molecular mechanisms similar to those used by tumor infiltrating leukocytes. this work shows that is possible to transfer biologically active leukocyte membrane proteins onto synthetic nanoparticles, thus creating biomimetic carriers retaining cell-like functions that are not affected by the protein corona effect that occurs in vivo. the targeting of the inflamed endothelium can be applied to a broad range of diseases and the approach used to formulate the system could open new avenues for the fabrication of the next generation of personalized treatments by using as cell membrane source the immune cells of patients. references: [ ] alessandro parodi, claudia corbo, armando cevenini, roberto molinaro, roberto palomba, laura pandolfi, marco agostini, francesco salvatore, ennio tasciotti. enabling cytoplasmic delivery and organelle targeting by surface modification of nanocarriers. nanomedicine uk. accepted. steroid hormone receptors are intracellular receptors that initiate signal transduction in response to steroid hormones, including oestrogen and androgens. generally, the binding of the steroid to the nuclear receptor induces the protein to form a dimer and relocate onto the chromatin, although the order of these events may vary. the location of receptor binding on the chromatin is defined by specific hormone response elements (hre). once located, the receptor promotes gene activation by the recruitment of other co-factors. it is this process that makes the complex of receptor protein and co-factors play a pivotal role in the regulation and activation of genes. the failure to regulate this process correctly is a key step in the development of several endocrine-driven cancers. for example: oestrogen receptor positive (er ) breast cancer is one of the most common forms of cancer and accounts for % of all breast cancer cases. in er tumours, the oestrogen receptor (er) drives the tumour growth and cell proliferation. understanding the interactions of the er with other proteins, either directly or indirectly, can provide vital insight to the regulation of the system that drives this cancer. the progesterone receptor (pr) has also been implicated in breast cancer, and the androgen receptor (ar) is a known driver in the majority of prostate cancers. to meet the challenges of elucidating these systems, we have developed methods to purify and analyse cross-linked regulatory complexes bound to dna by mass spectrometry (chip-ms). this allows for the enrichment of proteins involved in gene regulation. chip-ms, combined with tandem mass tags (tmt), makes it possible to realise a quantitative method to investigate the dynamic network of interactions between proteins within complexes that undertake the regulation of biological systems. chip-seq is a well-established method for identifying where these protein complexes are bound to the genome. this work focuses on how to combine these technologies with my previous development of cross-linking coupled mass spectrometry techniques (xcms) to provide a strategy for visualising the dynamic organisation of the proteins on the chromatin. global kinetic analysis of caspase protein substrates in cell lysate reveals selective roles and target specificity olivier julien , min zhuang , arun wiita , james wells caspases are cysteine proteases that play important roles in development, cell differentiation and cell death. however, the limited number of known caspase substrates hinders our understanding of caspase function. here we performed a non-biased identification and kinetic analysis of caspase- and caspase- proteolytic substrates in cell lysate, using an enzymatic n-termini enrichment approach followed by mass spectrometry. we identified and potential substrates for the initiator caspase- and putative executioner caspase- , respectively. our results not only confirm known substrates but also identify many more new substrates with the precise location of proteolysis. given the emerging roles of caspases- and in inflammation and neurodegeneration, these new substrates may provide molecular insight into the progression of related diseases. the sequence consensus logo of caspase- targets was very similar to a classical executioner caspase motif (devd), while caspase- revealed a vevd motif. using selected reaction monitoring (srm), we quantified the kinetics of proteolysis of a large subset of these substrates by measuring the appearance of the caspase cleavage product over time. in the end, we measured and kcat/km values for individual substrates cut by caspase- and caspase- , respectively. by comparing these data with our previous analysis of caspase- , , and , we found that substrates that are shared between caspases are often cleaved at rates that differ by orders of magnitude. thus, despite having nearly identical primary sequence motifs, the caspases exhibit remarkable substrate specificity that may reflect their specialized roles within the cell. the rockefeller university, new york university school of medicine, johns hopkins university school of medicine line- (l ) retrotransposons are catalysts of evolution and disease whose sequences comprise a significant proportion of the human genome. despite tremendous influence on genome composition, l rnas only encode two proteins. consequently, l particles include a combination of permissive host factors that are essential to their lifecycle as well as repressive factors that constitute defenses against l 's mutagenic activity. we previously characterized host proteins associated with synthetic and natural human l retrotransposons, as expressed in cell culture, using a combination of techniques including metabolic labeling and affinity proteomics. to build on these analyses, we have implemented a series of d separations and post-purification treatments to produce a multi-dimensional interactomic characterization of affinity isolated l s. these studies have revealed the presence of at least two populations of putative transposition intermediates that may exhibit distinctive intracellular localizations. we report a comprehensive, quantitative survey of the proteins partitioning within these distinct l populations and their associated in vitro activity. our observations provide a basis for the classification of l interactors with respect to their physical and functional links, facilitating hypotheses to direct in vivo experimentation. polyubiquitin recognition by continuous ubiquitin binding domains of rad probed by modeling, small-angle x-ray scattering and mutagenesis sangho lee , trung thanh thach , namsoo lee , donghyuk shin , seungsu han , gyuhee kim , hongtae kim rad is a key protein in double-strand break dna damage response (ddr) pathways by recognizing k -linked polyubiquitylated chromatin proteins through its bipartite ubiquitin binding domains ubz and lrm with extra residues in between. rad binds k -linked polyubiquitin chains as well as k linked ones and mono-ubiquitin. however, the detailed molecular basis of polyubiquitin recognition by ubz and lrm remains unclear. here, we examined the interaction of rad ( - ), including ubz and lrm, with linear polyubiquitin chains that are structurally similar to the k -linked ones. rad ( - ) binds linear polyubiquitin chains (ub , ub , ub ) with similar affinity to a k -linked one for diubiquitin. ab initio modeling suggests that lrm and the extra residues at the c-terminus of ubz (residues - ) likely form a continuous helix, termed 'extended lr motif' (elrm). we obtained a molecular envelope for rad ubz-elrm:linear ub by small-angle x-ray scattering and derived a structural model for the complex. the rad :linear ub model indicates that elrm enhances the binding of rad with linear polyubiquitin by contacting the proximal ubiquitin moiety. consistent with the structural analysis, mutational studies showed that residues in elrm affect binding with linear ub , not monoubiquitin. in cell data support that elrm is crucial in rad localization to dna damage sites. specifically e seems to be the most critical in polyubiquitin binding and localization to nuclear foci. finally, we reveal that the ubiquitin-binding domains of rad bind linear ub more tightly than those of rap , providing a quantitative basis for blockage of rap at dsb sites. taken together, our data demonstrate that rad ( - ) forms continuous ubiquitin binding domains, comprising ubz and elrm, and provides a structural framework for polyubiquitin recognition by rad in the ddr pathway at a molecular level. optimization of a protein extraction method for the proteomic study of pozol cynthia teresa leyva-arguelles , carmen wacher , rosario vera , romina rodr ıguez-sanoja instituto de investigaciones biom edicas, unam., facultad de qu ımica, unam., instituto de biotecnolog ıa, unam key words: proteomics, fermentation, pozol pozol is a mexican traditional no alcoholic beverage elaborated by various ethnic groups in the southeastern of mexico. pozol is obtained from the natural fermentation of nixtamal (heat-and alkali-treated maize) dough. the main carbohydrate in maize dough is starch ( - %), because others such as sucrose, glucose and fructose are mostly lost during nixtamalization; so, the starch remains as the major carbohydrate available for fermentation [ ] . a wide variety of microorganisms have already been isolated from the fermentation of pozol; these microorganisms include fungi, yeasts, lactic acid bacteria, and non-lactic acid bacteria [ ] . however, only few bacteria are amylolytic in this fermentation and all of them are weakly amylolytic [ ] . in an attempt to explain how a very low content of soluble sugars can support a diverse and abundant microbiota, a proteomic approach was designed to understand the fermentation of pozol [ ] . nevertheless, the extraction of proteins from pozol remains a limiting step in proteomic analysis mainly due to the complexity of the sample. on the basis of the aforementioned reasons, the aim of this work was to obtain a suitable extraction method of proteins for proteomic analysis. therefore, the fermentation of pozol was continued for h and samples were taken at , , and h. for each sample, the total sugar content was determined by the dubois et al. method [ ] and protein extraction was performed by two methods: a) direct extraction from the dough [ ] and b) initial extraction of microorganisms and soluble proteins (this work). comparison between the two protein methods was performed on two-dimensional gels with silver stain. then, gels underwent to image analysis by the image master d platinum software. comparing the d-gels, more proteins spots were obtained with method b than that with method a, indicating a more efficient protein extraction with method b. although, using method a higher concentration of total proteins was observed, they were mostly maize proteins, that in turn overlap and reduce the efficiently extraction of the microbial low abundant proteins. then, method b allows a better extraction of those low abundant proteins and removes sample components that may interfere with the determination. these results could help us to find the proteins involved in carbohydrate metabolism of the microbiota and finally elucidate the dynamics of pozol fermentation. proteomics has been applied to the enology field for numerous purposes including fermentation control, improvement of fermentation processes, ensuring wine quality, etc. according to rodriguez et al., ( ), the information provided by wine proteomics is not only useful for these intentions, but also offers excellent prospects for innovation and diversification of winemaking processes in the near future. in this context, our group has focused research on the identification of proteins that might be important for yeast survival under typical wine elaboration conditions (standard fermentation, sherry wine biological aging and sparkling wine second fermentation) as well as proteins that configure the content of metabolites which are ultimately responsible for wine quality. by using novel proteomic (offgel fractionator and ltq orbitrap xl ms) and metabolomic techniques (sbse-td-gc-ms) we have identified a high amount of up-regulated proteins involved in processes like oxidative stress response (in biological aging) or protein biosynthesis (in second fermentation) as well as thirty-three proteins directly involved in the metabolism of glycerol, ethanol and seventeen aroma compounds excreted by the yeast under biological aging conditions. further, in order to validate proteome data; null mutants of genes codifying proteins up-regulated in the biological aging condition were constructed. analyses of correlated phenotypes are in progress. this technique and its combination with metabolomics within the enology context will provide enough knowledge to design or choose yeasts or conditions that satisfy wine production and/or wine characteristics such as color/aroma/texture/flavour profile demands of winemakers and consumers. additional binding sites for cytochrome c on its redox membrane partners facilitate its turnover and sliding mechanisms within respiratory supercomplexes blas moreno-beltr an , antonio d ıaz-quintana , katiuska gonz alez-arzola , alejandra guerra-castellano , adri an vel azquez-campoy , miguel a. de la rosa , irene d ıaz-moreno ibvf, ciccartuja, universidad de sevilla -csic, bifi -iqfr (csic), universidad de zaragoza, departamento de bioqu ımica y biolog ıa molecular celular, universidad de zaragoza, gliding mechanisms of cytochrome c (cc) molecules have been proposed to shuttle electrons between respiratory complexes iii and iv within plant and mammalian mitochondrial supercomplexes, instead of carrying electrons by random diffusion across the intermembrane bulk phase [ ] [ ] . in this work, the binding molecular mechanisms of the plant and human cc with mitochondrial complexes iii and iv have been analyzed by nuclear magnetic resonance and isothermal titration calorimetry. our data reveal that both cc-involving adducts possess a : stoichiometry -that is, two cc molecules per adduct -. the presence of extra binding sites for cc at the surfaces of complexes iii and iv opens new perspectives on the mitochondrial electron transport chain, where membrane respiratory complexes can be either in independent, free diffusional motion or forming macromolecular assemblies. in the latter context, such new binding sites for cc facilitate the turnover and sliding mechanisms of cc molecules within supercomplexes. indeed, the accommodation of several cc molecules between complexes iii and iv in supercomplexes provide a path for cc diffusion from complex iii to iv. such path could have physiological significance in the electron flow, which is controlled in supercomplexes to optimize the use of available substrates [ ] [ ] [ ] . can bio-functionalities be deciphered from protein sequence information using computational approaches? background: the processes of uncovering bio-functionalities such as pharmacological activities, disease processes, physiological and structural properties by means of clinical approaches are irrational. this is because they are resource and time consuming. sometimes, they involve sophisticated and expensive equipments, reagents and animal tissues. contrarily, sequence information-based computerized approaches are rational and have become relevant in assessing bio-functionalities. they include geno pheno [coreceptor] [ ] , position-specific scoring matrix (pssmsi/nsi and pssmcxcr /ccr ) [ ] , and informational spectrum method (ism)-based phylogenetic analysis (istree) [ ] . aim: this presentation demonstrates how bio-functionalities could be deciphered from sequence information using computational approaches. method: ism procedure and peptides, vipmfsals and capagfail are engaged. results: protein sequences of the peptides are converted into bio-functionality (affinity). affinity between the two peptides is demonstrated as significant amplitudes at the point of common interaction also referred to as consensus frequency, signifying remarkable affinity. discussions: bio-functionalities of bio-molecules are known to be expressed in one or two genes, which have been found to provide as much biological information as the bio-molecules. this indicates that biological characteristics, represented in these genes and proteins can now be extracted from their sequence information. for example, multi-drug resistances arising from a variety anti-microbial agent from several classes including alkaloids, flavonoids, etc can be retrieved from the sequence information of their encoding genes (mdr and mdr ). similarly, translation of hiv infection to aids disease can be extracted from the protein sequence alterations in the hiv gp . similarly, effectiveness of anti-retroviral agent, maraviroc on the hiv isolate h bx and ndk can be deciphered from the sequence information of their v observed at the predicted sequences. these positions are important as they surround the cleavage site in the three-dimensional structure, and are probably less tolerant to change. moreover in previous studies, cys at p position has been shown to be the dominant determinant for cleavage efficiency, while cys, pro and glu at p position have also been shown to be correlated with increased cleavage efficiency of ns / a protease. for adv cysteine protease, on the other hand, bsst produces similar significant results for both type (xgx-g) and type (xgg-x) consensus cleavage sites, where p and p ' positions have gly with highest percentage in type (xgx-g) while p and p positions have gly in type (xgg-x). these indicate that the bsst seems to provide a powerful methodology for predicting the substrate specificity for the hcv ns / a serine protease and adv cysteine protease, which are targets in drug discovery studies. protein plasticity improves protein-protein binding description chiara pallara , juan fern andez-recio an accurate description of protein-protein interactions at atomic level is fundamental to understand cellular processes. however the current structural coverage of protein-protein interactions (i.e. available experimental structures plus potential models based on homologous complex structures) is below % of the estimated number of possible complexes formed between human proteins. , for these reasons, computational docking methods aim to become a complementary approach not only to solve the structural interactome but also to elucidate the basis of the protein-protein association mechanism. in spite of the advances in protein-protein binding description by docking, dealing with molecular flexibility is a major bottle-neck, as shown by the recent outcomes of the capri (critical assessment of prediction of interactions) experiment. this data clearly confirms that the protein dynamics plays a key role in protein-protein association. the use of conformational ensembles generated from unbound protein structures in combination with computational docking simulations might represent a more realistic description of protein-protein association. here, we present the first systematic study about the use of precomputed unbound ensembles in docking, as performed on a set of cases of the protein-protein docking benchmark . . the primary aim of our work is to understand the role of the protein conformational heterogeneity in protein-protein recognition. to do this, small conformational ensembles were automatically generated starting from the unbound docking partners, and then an extensive analysis of their binding properties was performed in the context of pydock docking scheme. the results show that considering conformational heterogeneity of interacting proteins can improve docking description in cases that involve intermediate conformational changes in the unbound-to-bound transition. more interestingly, we found that protein plasticity increases chances of finding conformations with better binding energy, not necessarily related to bound geometries. the relevance for future docking methodology development and for understanding protein association mechanism will be discussed. purpose of the research: there is increasing interest in the development of protein scaffolds that can be used to develop affinity reagents that are alternatives to antibodies. the affimer scaffold is based on the cystatin protein fold. the affimer scaffold is biologically inert, biophysically stable and capable of presenting a range of designed or random binding surfaces defined by peptides inserted at different loops. the result is highly specific, high affinity interactions with a wide range of targets including ones that are inaccessible to antibodies. affimers are designed to work in the same way as the very best antibodies, but with a number of key advantages. affimers are quick to develop (typically weeks) without using animals. they contain no disulphide bonds, are expressed easily in e. coli and have no batch to batch variability. affimers are small molecules ( aa, kda), robust and stable (resistant to ph range, thermally stable and not sensitive to edta). affimers can be a direct replacement for antibodies -no process or workflow change required -and perform identically to antibodies in assays such as elisa, facs, ihc, western blots, affinity purification, microarray and potentially therapeutics. we describe some applications of the technology in regards of affimer development for custom targets on one hand and for the biomarker discovery workflow using affimer microarrays on the other. main results: by screening of our very large ( x ) library against yeast sumo protein we identified affimers with high affinity allowing their use for elisa. moreover, no cross-reactivity was observed when affimers were used on western blots leading to a unique band specific to yeast sumo when compared to human proteins. a library of , random affimers, expressed in e. coli, was printed on glass microscope slides and challenged with plasma from children (n ) with sepsis and from healthy children (n ). unsupervised hierarchical clustering based on the , affimers allowed differentiation between the control and patient samples. affimers were found to differentially bind proteins between the groups with a > fold change. the affimer arrays identified a strong signature of sepsis and roc curve analysis allowed confident prediction of disease (auroc of . ). affinity purification and preliminary mass spectrometry analysis identified known biomarkers of sepsis and also potentially novel biomarkers not previously associated with this disease. major conclusions: this work demonstrates the scope of affimer affinity reagents to develop alternative binders to antibodies, where affimers perform identically in most assays without the disadvantages associated with antibodies. moreover, affimers enable a new protein microarray-based biomarker-discovery workflow and we predict that array-based validation of signatures identified using discovery arrays prior to affinity purification and mass spectrometry will offer a cost-and time-effective methodology compared to purely mass spec-driven workflows. tau pathologies, called 'tauopathies', are related to several neurodegenerative diseases including alzheimer disease (ad). in ad, tau protein is observed hyper-phosphorylated and aggregated as paired helical filament (phf). the neuronal tau protein is an intrinsically disordered proteins (idps). nuclear magnetic resonance spectroscopy (nmr) is here used to study the tau protein phosphorylations and protein-protein interactions (ppis). in in vitro assays, tau phosphorylation by rat brain extract is considered as an hyperphosphorylation model that was furthermore pointed out to enable tau aggregation [ ] . in a first step, we have identified all the phosphorylation sites of rat brain extract phosphorylated-tau, using the analytical capacity of nmr. we showed that the protein is modified at ser/thr sites. among the kinases that we have characterized so far using tau as substrate, only the extracellular signal-regulated kinase (erk ) shows an ability to modify in vitro tau protein on so many sites. we have indeed identified phosphorylated ser/thr-pro motifs out of potential phosphorylation sites in the sequence of full length -residue tau. in addition, we showed using transmission electron microscope (tem) a similar in vitro aggregation capacity of erk-phosphorylated tau protein compared to that of rat brain extract phosphorylated-tau. this shows that phosphorylation by the erk kinase generates an hyperphosphorylated tau. given the high efficiency of erk towards tau, we have next looked into the mechanism of tau recognition. erk kinase possesses two well-characterized docking domains: d recruitment sites (drs) and f recruitment sites (frs), which recruit complementary docking sites and increase the specificity and efficiency of the interaction with both its upstream regulators and downstream substrates [ ] . as the interaction between tau protein and erk kinase is analyzed by nmr spectroscopy, multiple sites of interaction are observed along the tau sequence, similar to drs docking sites, all located in the so-called microtubule binding domain of tau. these sites are short sequences loosely matching the reported consensus for d sites w - uxu (w, u, and x refer to positively charged, hydrophobic, or any intervening residues, respectively) [ ] , and also the reverse sequence uxuw - .to confirm the mapping of the interaction, two tau recognition sites were produced as recombinant peptides of about amino-acid in fusion with an n-terminal his-tag sumo. interaction assays using d [ h, n] hsqc spectra of the peptides confirm their binding to erk kinase. the potential of these peptides to inhibit erk activity with tau as substrate is now being investigated. while rigid-body docking has become quite successful for predicting the correct conformations of binary protein complexes, determining whether two given proteins interact remains a difficult problem. successful docking procedures often give equally good scores for pairs of proteins for which there is no evidence of interaction. studies investigating what we define as the 'pre-docking' problem via in silico approaches have only recently become feasible with the help of supercomputers and gridcomputing systems. in a previous work, on a restricted set of protein complexes, we showed how predictions of interacting partners could be greatly improved if the location of the correct binding interface on each protein was known. experimentally identified complexes are found to be much more likely to bring these two interfaces into contact, at the same time as yielding good interaction energies. we present data from a complete cross-docking (cc-d) study of a database of proteins, including the treatment of more than , potential binary interactions. the performance of the interaction index we developed to predict binding probability compares well with other methods. by studying the interaction of all potential protein pairs within a dataset, cc-d calculations can also help to identify correct protein interaction interfaces. the present large-scale study also reveals the influence of various protein families (enzyme-inhibitor, antibody-antigen, antigen-bound antibody, etc.) on binding specificity, showing, in particular, the distinctive behavior of antigenic interfaces compared to enzymes, inhibitors or antibodies. the performance of our approach is encouraging. although identifying interaction interfaces significantly helps in the identification of interacting proteins, further refinements will be necessary to make in silico cross-docking a viable alternative to high-throughput experimental methods. whole-protein mass spectrometry reveals global changes to histone modification patterns in hypoxia sarah wilkins , kuo-feng hsu , christopher schofield chemistry research laboratory, oxford university cells respond to limiting oxygen availability (hypoxia) by altering the gene expression profile. this primarily involves changes at the level of transcription via the activity of hypoxia-responsive transcription factors, although increasing evidence suggests that changes in chromatin structure (i.e. from a condensed 'silent' state to a more open or 'active' state) are required in order for transcription to take place. in particular, post-translational modifications (ptms) to histones have an important regulatory function in gene expression under hypoxic conditions. the n-terminal tails of histone proteins are accessible to a set of enzymes capable of 'writing' and 'erasing' ptms including acetylation, methylation, ubiquitylation, sumoylation and phosphorylation. to date, studies in hypoxia have employed antibody-based methods to investigate changes in histone modifications, and so have focused on individual marks in isolation. the interplay between coexisting ptms is thought to be much more important than the effect of any single mark. therefore, a global view of the histone modification profile is essential to gain a complete understanding of the function of histone ptms and their roles in gene regulation. in this study, we apply whole protein mass spectrometry to investigate hypoxia-induced changes in histone marks. this 'top-down' approach provides insight into combinational modification patterns that are difficult to establish by antibody-based methods or peptide ms analysis. we investigated changes in the global ptm profiles of histones from a range of human cell-lines and tissues under severe hypoxia (< . % o ). we find that hypoxia causes a shift in the overall profile towards a more highly modified state, with significant changes in methylation and phosphorylation. marked changes in histone ptms were also observed following treatment of cells with epigenetic inhibitors and commonly used hypoxia mimetics, including several iron chelators currently in clinical trials for the treatment of anaemia. finally, we show that this method can be used to identify the histone variant h ax, whose phosphorylation at serine is an indicator of double-stranded dna breaks in cancer. overall, these data provide important insights into the epigenetic changes associated with hypoxia in normal and disease contexts. we hope to further develop this method in combination with different labelling strategies to enable quantitative analysis of histonemodifications in cells. mass spectrometry-based protein biomarker discovery in neurodevelopmental disorders interactions. there is currently no biological diagnosis or known cause of asd. slos is characterized by a cholesterol deficiency due to a mutation on the dhcr gene. approximately / , babies are born with slos. diagnosis is achieved by measuring cholesterol and -dehydrocholesterol ( dhc) levels in the blood, however, there is currently no proven treatment for slos. because of this, research is increasing to determine biomarkers for these disorders. here, samples from people with asd (sera and saliva) and slos (saliva), and matched controls were analyzed using a combination of gel electrophoresis (tricine-page, sds-page and blue native page), in gel digestion or insolution digestion and nanoliquid chromatography-tandem mass spectrometry (nanolc-ms/ms) to investigate differences between the proteomes of people with these neurodevelopmental disorders and matched controls. several alterations in protein expression were identified. these differences may lead to potential biomarkers for diagnosis, possible therapeutic targets and an altogether better understanding of the disorders. understanding protein recognition using structural features protein-protein interactions (ppis) play a crucial role in virtually all cell processes. thus, understanding the molecular mechanism of protein recognition is a critical challenge in molecular biology. previous works in this field show that not only the binding region but also the rest of the protein is involved in the interaction, suggesting a funnel-like recognition model as responsible of facilitating the interacting process. further more, we have previously shown that three-dimensional local structural features (groups of protein loops) define characteristic patterns (interaction signatures) that can be used to predict whether two proteins will interact or not. a notable trait of this prediction system is that interaction signatures can be denoted as favouring or disfavouring depending on their role on the promotion of the molecular binding. here, we use such features in order to determine differences between the binding interface and the rest of the protein surface in known ppis. particularly, we study computationally three different groups of protein-protein interfaces: i) native interfaces (the actual binding patches of the interacting pairs), ii) partial interfaces (the docking between a binding patch and a non-interacting patch), and iii) back-to-back interfaces (the docking between non-interacting patches for both of the interacting proteins). our results show that the interaction signatures in partial interfaces are much less favoured than the ones observed in native and back-to-back interfaces. we hypothesise that this phenomenon is related to the dynamics of the molecular association process. back-to-back interfaces preserve the exposure of the real interacting patches (thus, allowing the formation of a native interface), while in a partial interface one interacting patch is sequestered and becomes unavailable to form a native interaction. structural characterization of the cytoplasmic mrna export platform laboratory of cellular and structural biology, the rockefeller university., laboratory of mass spectrometry and gaseous ion chemistry, the rockefeller univ., university of california, san francisco, the new york structural biology center, department of biochemistry, faculty of medicine, university of montreal mrna biogenesis is an intricate process that begins within the nucleus and culminates with the remodeling and nuclear export of the mrnp particles through the nuclear pore complex (npc). defects in this conserved mechanism have been shown to cause serious human diseases. the protein assembly that performs the last steps in mrnp biogenesis and export is located at the cytoplasmic face of the npc and is formed by different proteins, organized into several subcomplexes whose arrangement and molecular architecture are poorly understood. in this study we applied an integrative approach, combining cross-linking and mass spectrometry (cx-ms), electron microscopy and available high-resolution structures, to describe the molecular architecture of the endogenous npc cytoplasmic mrnp export machinery. we generate a hybrid, close-to-atomic structure of the yeast native nup complex, the core of the assembly. our map also reveals how the nup complex organizes the entire cytoplasmic mrnp export machinery, and how this in turn docks into the architectural core of the npc. mapping of phenotypic profiles into our structures allows us to generate a first functional map of the ensemble. we expect that our map will serve as a framework to understand the molecular mechanisms underlying this key step of mrnp biogenesis. study of candidate proteins to pore associated with p x receptor in different cell types carla oliveira , anael alberto , mônica freitas , luiz alves laborat orio de comunicac¸ão celular -fiocruz, centro nacional de ressonância magn etica nuclear -ufrj aim: the p x r is a purinergic receptor, which differs from others subtypes due to its structural and pharmacological characteristics. when exposed for extended time or to high concentrations of its agonist (atp), promotes an increase in membrane permeability, allowing the passage of molecules up to da. there is a controversy among several authors that leave in doubt if this receptor needs a second protein for the pore formation and which protein could be. we select five pore-forming proteins: trpv , trpa , connexins- (cx- ), pannexin- (panx- ) and vdac. we believe that different mechanisms and proteins could be associated with p x r, depending on the cell type and their microenvironment stimuli. in this context, our main goal is identify possible proteins that could be associated with the p x r pore in different cells and species. methods and results: we started with rt-pcr technique of cell lines: j .g , n a, u , u , hek- and primary cells from wistar mouse and swiss mice. we used different primers and pcr cycle for each target at different species. we observed that the p x , panx- and cx- are the most abundant and are present in all cell types except the absence of p x in u cells and panx- in mice macrophages and u cells (n> ). however, trpv was seen at n a and u cells and trpa in and primary cells from mouse and mice and in j .g cells (n> ). regarding to the vdac, it is present in mouse macrophages, j .g and hek- cells (n> ). the further steps, we verified if those proteins could be physically associated with the p x r. we coimmunoprecipitated the p x r of j .g (with or without atp), mice macrophages, hek- and u cells. the samples were applied in two separated . % bis/acrylamide gels: one destined to mass spectrometry (ms) and the other to western blot. at this point, we confirmed the presence of p x r, and observed several others proteins associated to p x r at different cell conditions, mainly when we exposed, j .g cell, to mm atp (n ). at this condition, we found by ms, hsp , , and ; alpha and b tubulin; myosin va; alpha, b and g actin; malate and lactate dehydrogenase (n ). although u and hek- had not received atp treatment, we found several proteins associated to p x . the next step was to immunoprecipitated those proteins in j .g (treated or not with atp) and use it to verify if p x are physically associated to them. as result we saw the p x associated to panx- in j .g cells. conclusion: we conclude that the p x r activated by extracellular atp triggers the recruitment of variety different proteins. at this condition, we can suggest that maybe there is a conformational change, regardless of the numerous recruitment structural proteins. in addition, apparently, the pore-forming protein pannexin- is associated with p x r, and the others pore forming proteins (vdac, cx- , trpv , trpa ) seems not be linked to p x r at j . recently, we developed a series of molecular modeling tools for structure-based studies of protein functions and interactions. these tools are publicly available as web servers that are easily operated even by non-specialists: cabs-fold server for protein structure prediction [ ] ; cabs-flex server for modeling of protein structure flexibility [ ] ; aggrescan d server for prediction of protein aggregation propensities and rational design of protein solubility [ ] ; and cabs-dock server for prediction of peptide binding sites and peptide docking [ ] . the web servers are freely available from the laboratory website: http://biocomp.chem.uw.edu.pl/tools sandy on , pinghui feng university of southern california, keck school of medicine, developing a technique to detect deamidated proteins and peptides using rig-i sandy on, pinghui feng university of southern california, norris comprehensive cancer center, department of microbiology, and molecular biology, los angeles ca perhaps the most notable type of post-translational modification of proteins and peptides into a higher order structure is deamidation of asparagine and glutamine. deamidation occurs when an amine group is removed, degrading the molecule for purpose of regulating intracellular levels. previous studies have demonstrated that this notable post translational modification has been uncovered over time for use in dna recombinant technology as well as use as a biological clock to facilitate the rapid turnover of biologically important components of the cell. while the effects of this non-enzymatic chemical reaction have been widely studied, the method to uncover modification sites over a large quantity of proteins remains an issue. one of the most common types of deamidation is of asparagine and glutamine residues. at this time, most researchers will depend on mass spectrometric based proteomic techniques for identification of these post-translational sites. the issue is that mass spectral analysis of deamidated proteins and peptides is complication and can lead to misassigned identification attributed by an overlapping of c peak of the amidated form with the deamidated monoisotopic peak; these two peaks are only separated by . mda. while these issues can be mediated by using a mass spectrometer with a high mass measurement accuracy, and high resolving power, it is essential to establish simpler methods for identifying substrates that have undergone deamidation. if deamidation is present, different protein bands will be exhibited in the western blot, which will be compared to a triple mutant rig-i, which resists deamidation, to observe the location of this modification on the protein. with enough testing, i will determine specific sites of digestion and use this information to make conclusions of unknown proteins. i will make results regarding whether the protein has been modified based on the digestion sites. i will use mass spectrophotometry analysis to compare the proteins on a wider scale and double check my results. i have narrowed it down to a couple of different digestion sites that indicate deamidation. though the analysis work can be tedious, it is crucial to ensure the sites we isolate are accurate in order to establish this technique. from my research, we can apply this method for wider scale use such as in clinical settings. in areas of inflammation of parkinson's' patients, we can review specifically the infected cells versus uninfected and isolate the proteins, usually deamidated, responsible often smaller in size and more specific. in addition, research articles have already shown that suppressing modification of certain cells such as bcl-xl playing a major in leading the regulation of cancer cell death by apoptosis. by leading the discovery of a simpler methods to uncovering deamidation in cells, researchers will more easily and quickly be able to scan through various proteins, some of which discovered eventually may play pivotal roles in cancer research. influenza virus (iv) hemagglutinin (ha) is a homotrimeric integral membrane glycoprotein that mediates receptor-binding and membrane fusion. it constitutes the prominent viral surface antigen and a main target for neutralizing antibodies. bacterial, recombinant ha-based vaccines indicate high potential to confer protection against highly pathogenic (hp) avian iv (aiv) h n and arise as alternative for the traditional egg-or cell culture-based manufacturing. relatively short time of bacterial has production can be of great importance in case of a pandemic. escherichia coli produced protein, based on the ha sequence of a/swan/poland/ - v / (h n ) hpaiv*, has been successfully expressed in the form of inclusion bodies at institute of biotechnology and antibiotics. refolded and purified antigen was obtained in a soluble form, isolated by reversed phase hplc and identified with peptide mass fingerprinting using matrix-assisted laser desorption/ionization time of flight mass spectrometry (maldi tof/tof ms). the performed research in a great extent allowed to confirm the amino acid sequence of the recombinant ha (rha) assumed based on the cdna and allowed to establish the location of a total of six disulfide bridges. however, during purification and storage of the rha, apart from desired higher order rosette-like structures of the protein, other non-native species resulting from posttranslational modifications, misfolding, aggregation and degradation may occur what results in reduced vaccine potency. here, besides the properly folded monomers, we indicate non-native aggregates induced by disulfide crosslinking. moreover, several free cysteine residues and unexpected intrachain s-s were identified in rha tryptic peptide maps. cys was found most susceptible to formation of disulfide bridges between the distinct chains of rha. the above findings allow to assume that not all rha particles fold to form the native structure. reduced cys residues exhibit tendency to undergo oxidation and uncon- new strategies and approaches to understand how antibodies recognize and neutralize snake toxins represent a challenge to improve the antivenoms. the neurotoxic activity of micrurus venom is carried majority by two distinct proteins families, ftx and pla . the conserved structural folding of these toxins can be appreciated as model to generate inhibitors against them. in this regard, monoclonal antibodies (mabs) can be used as tool to find hot spots for inhibit the toxins and represent the first step in order to develop recombinant neutralizing molecules. in this work our goals were analyse a set of monoclonal antibodies against the most toxic components of m. altirostris venom by proteomics approaches. the venom was fractionated; its major toxic proteins identified by in vivo tests based on murine lethal toxicity analyses (approved by the ethical committee for animal experimentation from center of health and science of the federal university of rio de janeiro -no. . / - ). the toxic components were used to generate a panel of five monoclonal antibodies. elisa and antivenomics results allowed us identify the specificity of all mab and their neutralizing efficacy was measured by in vitro tests. three mabs showed reactivity towards ftx and two against pla . all monoclonal antibodies against ftx lack a broad recognition. however, we identified a pair of monoclonal antibodies able to recognize all pla molecules of m. altirostris venom and showed a synergism to inhibit the catalytic activity of them. moreover, we challenge monoclonal antibodies against to micrurus venom for inhibit the pla activity of naja naja, specie taxonomically out of micrurus cluster. our results showed that pla of m. altirostris venom share a pair of conserved antigenic regions and draw attention to use these epitopes to miming antigen to generate antibodies for antivenom production. moreover, face to the cross reactivity and the pla activity inhibition capability by mabs towards the naja naja venom, our results highlight the conservation of neutralizing epitopes across the elapidae family. protein-protein interactions are known to play key roles in the most important cellular and biological processes such as signaling, metabolism, and trafficking. one major goal of structural biology is the structural characterization of all protein complexes in human and other organisms. these efforts can be complemented by computational approaches. in this context, computational docking attempts to predict the structure of complexes from their monomeric constituents. the docking problem presents two main challenges: the generation of structural poses or sampling, and the identification of the correct structures with a scoring function (sf). docking methods can be successful if the interacting partners undergo small conformational changes. however, in a general situation, these algorithms generate a large number of incorrect predictions, and therefore the predictive success strongly depends on the accuracy of the sf used to evaluate the docked conformations. a variety of strategies have been developed to score putative protein-protein docked complexes. they are usually based on atomic level potentials, residue level potentials, or a combination of both. in current work, we have evaluated different sf, taken from cchappi server, on the results of different rigid body docking methods, ftdock, zdock, and sdock, using the docking benchmark . and a docking set built from capri scorers experiment. our results show sf that showed better or similar success rate than the in-built sf. some of these sf increase the docking success rates especially for flexible or weak-binding cases, which are the most challenging for docking. of them are residue level sf robust enough to detected solutions in cases with large conformation change. in particular we found two sf that shows outstanding robustness, one designed for protein modeling and shared among docking methods, and the other is for protein docking which is also the best success rate in the top ranking in the capri scorers set. the other atomic level sf display high success rate to find a solution within weak binding proteins. the most successful sf are shared between the docking methods and display high success rate in the hard cases of the benchmark . and in the capri scorers set. the difference between them in the resolution level at which they work, one being atomistic the other residue-based. we found that they success rate vary according to the docking method chosen, allowing them to explode different properties of the sampling used. this way to characterize a protein complex can help to develop new combined scoring functions in protein docking or a new ranking strategy to enhance the success rate. multi-ptk antibody: a powerful tool to detect a wide variety of protein tyrosine kinases (ptks) isamu kameshita , noriyuki sueyoshi , yasunori sugiyama the eukaryotic protein kinases consist of large families of homologous proteins and play pivotal roles in various cellular functions. these enzymes are classified into two major groups; protein serine/threonine kinases and protein tyrosine kinases (ptks). ptks are believed to be involved in various cellular events such as cell cycle, proliferation, differentiation, apoptosis, and cell adhesion in multicellular eukaryotes. as many as ptk genes have been identified in the human genome and many of these ptks are known to be closely correlated with various diseases such as cancer. therefore, it is important to elucidate the expression profiles of the entire ptk family in cells and tissues. to investigate the expression profiles of the cellular ptks, we produced an antibody that detects a wide variety of ptks. for production of the antibody, antigenic peptides corresponding to amino acid sequences of a highly conserved region (subdomain vib) of ptks were synthesized and immunized to balb/c mice. among various antigens, a peptide with amino acids, cyvhrdlraan, efficiently produced a polyclonal antibody with a broad reactivity to ptks. we established a hybridoma cell line producing a monoclonal antibody, yk , which appeared to cross-react with various ptks. at least ptks could be detected by yk antibody, as evidenced by its reactivity with the recombinant src tyrosine kinases whose subdomain vib had been replaced by those of the other ptks. when differentiated hl- cells were analyzed by western blotting after two-dimensional electrophoresis with yk antibody, we observed significant changes in the immunoreactive spots in hl- cell extracts along with the changes in the morphology of the cells. these results suggest that the multi-ptk antibody, yk , will be a powerful tool for the analysis of a variety of cellular ptks. analysis of the siglec- and hvap- interactions leonor carvalho , vimal parkash , heli elovaara , sirpa jalkanen , xiang-guo li , tiina salminen structural bionformatics laboratory, department of biosciences, medicity research laboratory, department of pharmacology, drug development and therapeutics, sialic acid-binding immunoglobulin (ig)-like lectins (siglec) are type i transmembrane proteins. siglec- has an n-terminal v-set domain followed by two c -set domains in the extracellular region. it contains an immunoreceptor tyrosinebased inhibitory motif (itims) in its cytoplasmic tail and can function as an inhibitory receptor by dampening the tyrosine kinase-driven signaling pathways. these proteins are expressed primarily on leukocyte subsets and, thus, are thought to be involved in regulation of leukocyte functions during inflammatory and immune responses. recently, phage display screening experiments identified siglec- as leukocyte surface ligand for human vascular adhesion protein (hvap- ; aoc gene product) and their interaction was confirmed by cell adhesion and enzymatic assays (kivi et al., ; aalto et al., ) . based on our preliminary data, hvap- sugar units with sialic acid (sa) might mediate interactions with the v-set domain in siglec- . furthermore, it is known that the siglec peptides binding to hvap- are located in the ce loop of the second c -set of domain (siglec- _c ). based on current hypothesis an arginine in siglec- _c interacts with the tpq residue in the active site of hvap- . the ce loop of siglec- _c has two arginines (r and r ) and, therefore, the interacting arginine is unclear. we will now study the interaction mode of hvap- and siglec- in silico to predict the role of the arginines in the c domain and the role of sa-binding using the d model of the full-length ectodomain of siglec- and the hvap- crystal structure. the in silico analysis will be conducted in parallel with experimental site-specific mutational studies and the result will be combined to elucidate the mechanism of hvap- -siglec- interaction. adam middleton , catherine day attachment of ubiquitin to substrate proteins regulates almost all cellular processes, including protein degradation and cell division. ubiquitylation involves a cascade of three families of proteins: ubiquitin activating (e ), ubiquitin conjugating (e ) and ubiquitin ligase (e ) enzymes. the . kda protein can be attached as a monomeric moiety or as a polyubiquitin chain, and the type of modification spells out the 'ubiquitin code' that directs the fate of the substrate. polyubiquitin chains can be formed via eight different linkage types, and the arrangement of chain formation is typically directed by the e enzymes. forming a polyubiquitin chain involves binding of two molecules to the e : the donor (ubd) and acceptor (uba) ubiquitin. ubd is linked to the e via a thioester bond between its c-terminal gly and the active site cys of the e , and when primed for catalysis it interacts with a particular face of the e . in contrast, coordination of uba by e s is transient and cannot be easily measured; however, uba binding defines the linkage type of polyubiquitin chains. the e , ube k, directs lys chain synthesis, which results in modified proteins being degraded by the proteasome. we generated a stable form of the ube kub conjugate and crystallized it, and showed that both ube k and its ubiquitin conjugate are monomeric. using molecular docking, we modelled the position of both ubd and uba and investigated the interfaces with site-directed mutagenesis. these experiments led to a molecular model that revealed how ube k can synthesise lys -linked ubiquitin chains. this molecular explanation provides a foundation for understanding how other e s generate lys -linked polyubiquitin chains. the two chromophorylated linkers of r-phycoerythrin in gracilaria chilensis marta bunster, francisco lobos-gonz alez, jos e aleikar v asquez, carola bruna, jos e mart ınez-oyanedl fac de cs biol., universidad de concepci on the two chromophorylated linkers of r-phycoerythrin in gracilaria chilensis. francisco lobos-gonz alez, jos e aleikar v asquez, carola bruna, jos e mart ınez-oyanedel, marta bunster. departamento de bioqu ımica y biolog ıa molecular, facultad de ciencias biol ogicas, universidad de concepci on. phycoerythrin is a phycobiliprotein present in phycobilisomes in gracilaria chilensis as a complex with chromophorylated linker proteins. our interest is to discover the role of these linkers in the function of phycobilisomes. phycobilisomes(pbp) are auxiliary light harvesting protein complexes in charge of channeling energy towards photosystem ii in alga, cyanobacteria and cryptophyta. this is possible thanks to fluorescent proteins called phycobiliproteins (pbp) and the chromophores (phycobilins, open-chain tetrapyrrols) attached to specific cysteines. phycobiliproteins share a common general structure; they are organized as (alfab) heterodimers which themselves assemble as trimers(alfab) or hexamers (alfab) ; this complexes are organized in high order structures to form the core and the rods. besides pbps, pbs have linker proteins in charge of the assembly and stabilization of the complex, and also it has been proposed that they collaborate in the fine tuning of the energy transfer steps between chromophores. these linkers are located within the rods, the rod-core interface, the core and the core-membrane interface. although most linker proteins are colorless, chromophore bearing linkers have been described, which suggest its participation in the energy transfer process. two of them, g and g are associated to r-phycoerythrin in gracilaria chilensis, nevertheless the information available on these linkers in eukaryots is still limited. to understand how these linkers collaborate with the function of the phycobilisome, we need structural information, especially the coordinates of all the chromophores present in the complex; we have sequenced both linkers from the genomic dna, performed sequence analysis and also we have purified the linkers by anion exchange, molecular sieve and hpl chromatography. the characterization was performed by denaturant electrophoresis, absorption and emission spectroscopy and by mass spectrometry. the results show that they have molecular masses as predicted, with a peptide signal for chloroplasts, an internal sequence repeat; residues - with residues - for g and residues - with residues - for g , and the presence of conserved cysteine residues putative sites of chromophorylation. the spectroscopy shows that they have different composition of phycobilins and a very short t / . a preliminary model for both linkers shows that they belong to aa structural class and that they share a common fold (heat like motifs) frequently involved in protein-protein interactions. dept. of phys., chuo univ., grad. sch. of inform. sci. and eng., tokyo tech, rigid-body docking algorithms are useful for predicting tertiary structures of near-native protein complexes. however, this algorithms generate many protein complex poses including false positives. then, near-native poses are searched in a post-docking process. there are many computational softwares with rigid-body docking algorithms, for example, zdock. we developed a high-performance protein-protein interaction prediction software, megadock, which is basically used on supercomputing environments for a large scale and network level in this work, we then tried to use these docking softwares and the profile method for understanding mechanisms of protein-protein interactions. we focused on some physicochemical properties, electrostatic and hydrophobicity, of a set of protein complex poses generated by a rigid-body docking process. from these poses, we obtained sets of possible interacting amino acid pairs. a set of interaction profiles has some information of docking spaces. from the view of a network prediction, the docking spaces of a set of protein complex poses are one of the properties for discriminating native protein-protein pairs from non-native pairs. in this work, ensemble docking process is performed by megadock ver. . and zdock ver. . . . cluster analysis is used with profiles of physicochemical properties. we used a dataset composed of typical monomer-monomer protein pairs and will discuss mainly differences between native and non-native protein pairs. the structural studies of the two thermostable laccases from the white-rot fungi pycnoporus sanguineus marta orlikowska , grzegorz bujacz institute of technical biochemistry, lodz university of technology, poland laccases (ec . . . , benzenodiol oxygen oxidoreductases) are enzymes that have the ability to catalyze the oxidation a wide spectrum of phenolic compounds with the four-electron reduction of molecular oxygen to water [ ] . it has been found that the active site is well conserved in between laccases from different organisms. it contains four copper atoms: one paramagnetic type cooper (t ) that is responsible for their characteristic blue color and where the oxidation of the reducing substrate occurs, one type cooper (t ) and two type coopers (t ) that conform a trinuclear cluster in which molecular oxygen is reduced to two molecules of water [ ] . laccases are present in many different species and they have been isolated from plants, fungi, prokaryotes, and arthropods in most cases laccases are monomeric glycoproteins of around amino acids with molecular weights in the range of - kda. the various functions carried out by those enzymes include the antagonistic ones such as their involvement in lignin biosynthesis (in plants), lignin degradation, pigment production, fruiting body formation, pathogenesis (in fungi) and spore protection against uv light (in bacteria) [ , ] . the diversified functions of laccases make them an interesting enzyme for study from the point of view of their structure, function and application. laccases of white-rot fungi (wrf) are of special interest because one of its role is to degrade lignin and most of them are extracellular enzymes helping purification procedures [ ] . during the last two decades, there has been an increasing interest in the genus pycnoporus for its ability to overproduce high redox potential laccases as the ligninolytic enzymes. we present the crystal structures of two thermostable lacasses produced by strain pycnoporus sanguineus cs (laci and lacii). the molecular weights of laci and lacii, determined by sds-electrophoresis, is and kda, respectively [ ] . both isoforms shows high amino acids sequence similarity ( %) between them and high thermal stability, at c and c. they remained active at high concentration of organic solvent (acetonitrile, ethanol or acetone). the unique properties make them promising candidates for industrial applications in wasterwater treatment. laci exerted a higher thermal and ph stability, tolerance against inhibitors and was a more efficient catalyst for abts and dmp (laccases substrate) then lacii [ ] . based on the structures we would like to understand the isoforms differences that confers laci a markedly better performance than lacii in ph and thermal stability as well as better resistance to inhibitors. analysis of liver proteome in cystathionine ß-synthase deficient mice using d ief/sds-page gel electrophoresis, maldi-tof mass spectrometry, and label-free based relative quantitative proteomics izabela bieli nska , Łukasz marczak , hieronim jakubowski , institute of bioorganic chemistry, polish academy of sciences, rutgers university, new jersey medical school homocysteine (hcy) arises from the metabolism of the essential dietary protein amino acid methionine. levels of hcy are regulated by remethylation to met and transsulfuration to cys. cystathionine bsynthase (cbs) catalyzes the conversion of homocysteine to cystathionine (first step of transsulfuration reaction). human cbs deficiency is a recessive inborn error of homocysteine metabolism that casues severe hyperhomocysteinemia (hhcy) and diverse clinical manifestations, including fatty liver disease [ ] . although the causes of fatty liver disease in cbs deficiency have been studied the underlying mechanism is not understood. we hypothesize that cbs deficiency induces changes in gene expression that could impair liver homeostasis. to test this hypothesis and gain insight into hepatic functions of cbs we analyzed the liver proteome of cbs -/-and cbs / mice [ , ] using d ief/sds-page gel electrophoresis and maldi-tof mass spectrometry (n ) we identified twelve liver proteins whose expression was significantly altered as a result of the cbs gene inactivation. expression of three proteins was upregulated and of nine down-regulated by the cbs-/-genotype. two up-regulated liver proteins are involved in iron metabolism (ftl and fth). those proteins are associated with oxidation stress and inflammation. third up-regulated liver protein (cbr ) is related to oxidation-reduction process. the downregulated protein are involved in the hydrolysis of n-acylated or n-acetylated amino acids (acy ), regulation of endopeptidase activity (a at ), cholesterol biosynthetic process (fpps), amino acid degradation (huth), cellular calcium ion homeostasis and l-ascorbic acid biosynthetic process (rgn). using label-free based relative quantitative proteomics (n ) we identified fourteen liver proteins whose expression was significantly altered as a result of the cbs gene inactivation. expression of four proteins was up-regulated and of ten proteins was down-regulated. the down-regulated liver proteins are linked with regulation of bone mineralization and inflammatory response (ahsg) or regulation of mrna splicing (roa ). the up-regulated liver proteins are involved in tricarboxylic acid cycle (suca), oxidation-reduction process (cy ), cholesterol metabolic process, iron ion homeostasis (fech), fatty acid metabolic process (ssdh; eci ) and response to oxidative stress (lonm). our findings suggests that cbs interacts with diverse cellular processes, including lipid metabolism, that are essential for normal liver homeostasis. deregulation of genes involved in lipid metabolism provides a possible explanation for fatty liver disease associated with cbs deficiency. transcription factors play central roles in coordinating developmental processes, as evidenced by the increasing number of transcription factor-related developmental disorders being uncovered by nextgeneration sequencing and genome-wide studies of copy number variation. the action of a transcription factor in regulating gene expression depends on interactions with other transcription factors, coactivators/co-repressors and chromatin modifying and remodeling complexes. transcription factors are commonly regulated by post-translational modifications. however the study of protein-protein interactions and post-translational modifications of transcription factors by common techniques such as coimmunoprecipitation and mass spectrometry is hampered by the difficulty in preserving interactions and modifications through cell lysis. to circumvent this issue, we developed a bioluminescence resonance energy transfer (bret) assay, which allows protein-protein interactions to be observed in live cells. in this assay, a protein of interest is expressed as a fusion with luciferase from renilla reniformis, and its putative interaction partner as a fusion with yellow fluorescent protein (yfp). upon addition of a cell-permeable substrate, the distance-dependent non-radiative transfer of energy from luciferase to yfp is quantified by measurement of light emission at two wavelengths to assess the interaction between the two fusion proteins. to validate the utility of this assay for investigating transcription factor interactions, we confirmed homodimerization of the foxp transcription factor, haploinsufficiency of which causes a rare and severe speech and language disorder, as well as interaction of foxp with other members of the foxp family. we also confirmed the interaction between foxp and multiple candidate interactors identified through yeast two-hybrid assays, including the autism-related transcription factor tbr , the co-repressors ctbp and ctbp , and post-translational modification enzymes of the pias family. the role of pias enzymes in sumoylation -the covalent modification of proteins with small ubiquitin-like modifier (sumo) proteins -led us to further explore this process, which is notably difficult to investigate because of the dynamic and labile nature of the modification, which is also typically present on only a minor fraction of molecules of a given protein. combining the bret assay with gel-shift techniques we demonstrated that foxp is sumoylated. finally, we used the bret assay to examine the effects of etiological foxp variants in speech and language disorder on protein-protein interactions and post-translational modification. in summary, the bret assay is a sensitive, reliable and potentially high-throughput technique for exploring protein biology in the context of live cells. we have demonstrated applications of the assay in validating putative protein-protein interactions, assessing posttranslational modifications, and investigating functional effects of protein variants identified in patient cohorts. these investigations have provided novel insights into the function of the foxp transcription factor in neurodevelopment and into the etiology of foxp -related speech and language disorder. the directly interaction between pres of human virus b and human heat shock protein (hsp ) deqiang wang , chen ke , jun zhang key laboratory of molecular biology on infectious disease, the department of cell biology and genetics the directly interaction between pres of human virus b and human heat shock protein (hsp ). hepatitis b virus (hbv) has infected billion people worldwide, and million of them are chronically infected. the chronic virus infection, a major public health problem worldwide, leads to bout two-thirds of hepatocellular carcinoma (hcc). the hbv envelope consists of the large (l), middle (m) and small (s) envelope proteins, which contain pres -pres -s, pres -s, and s domain alone, respectively [ ] . the pres domain is believed to mediate virus attachment to the high-affinity receptor. yan et al employed a novel technique to propose sodium taurocholate co-transporting polypeptide (ntcp) as the candidate hbv receptor, and consequently, ntcp is a target for a new family of anti-hbv agents [ ] . whereas, it remains a query to clarify that ntcp is the only or major hbv receptor in vivo. to illuminate if other host proteins cooperatively participate the hbv infection, we detect the interaction between pres and many candidate host proteins. fortunately, we have found that the human heat shocking protein (hsp ) could directly interact with the pres domain of the hbv virus protein. both the pull down and the size exclusion chromatography experiments verify that the grp have the ability binding to pres . whereas, whether the interaction between hsp and pres relates to the hbv infection need further experiments to clarify. the member sponsorship in the vast world of naturally occurring peptides, where more than peptides are known and approximately peptide therapeutics are currently being evaluated in clinical trials (fosgerau & hoffmann, ), the rapid and accurate determination of their physicochemical properties is key in peptide drug discovery. among these properties, hydrophobicity is crucial for understanding molecular recognition and biomolecular aggregation. hence, there is a great interest in determining hydrophobicity scales for amino acid structures. in this work, octanol/water partition (log p) and octanol/water distribution (log dph, fig. ) of n-acetyl-l-amino-acid methyl amides were determined by means of quantum mechanical ief-mst solvation calculations taking into account the intrinsic conformational preferences of each amino acid according to dunbrack's libraries (dunbrack & karplus, ; ) . the results reveal log d . differences for a-helical and b-sheet conformations in arg, lys, hid, asn, gln, met, cys, leu and ile. furthermore, by decomposing the octanol/water transfer free energy into electrostatic and non-electrostatic components, we estimated that the non-electrostatic cost of transferring the amino acid side chain amounts to . . cal/mol.Å , in agreement with previous estimates reported in the literature. comparison of our scale with other theoretical and experimental hydrophobicity scales yields satisfactory results, leading to correlation coefficients ranging from . to . . additionally, the mstderived hydrophobicity scale led to significant correlations with the rp-hplc retention factors measured for eight decapeptides (r . ) and for influenza virus hemagglutinin -mer (ac-ypydvp-dyaslrs-amide) peptides (r . ). finally, the hydrophobicity scale was able to reproduce the experimental log p for random neutral peptides (r . ) and log d . for ' : random charged peptides (r . ), fig. . future studies will address the application of this methodology to nonproteogenic amino acids, the prediction of peptide hydrophobicity at global and atomic level in peptides, and the scoring of peptide-protein interactions. docking-based tools for discovery of protein-protein modulators docking-based tools for discovery of protein-protein modulators. protein-protein interactions (ppis) play an essential role in many biological processes, including disease conditions. strategies to modulate ppis with small molecules have therefore attracted increasing interest over the last few years. although protein-protein interfaces (ppifs) are considered difficult to target with small molecules given its lack of well defined cavities. successful ppi inhibitors have been reported into transient cavities from previously flat ppifs. recent studies emphasize on hotspots (those residues contribute for most of the energy of binding) as promising targets for the modulation of ppi. pydock algorithm is one of the few computational methods that use energy of solvation to predict protein-protein interfaces and hotspots residues. we present an approach aimed at identifying hotspots and transient pockets from predicted proteinprotein interfaces in order to find potential small molecules capable of modulating ppis. the method uses pydock to identify ppifs and hotspots and molecular dynamics (md) techniques to propose putative transient cavities. we benchmarked the protocol in a small set of protein-protein complexes for which both structural data and ppi inhibitors are known. the method applies to the unbound proteins of the complexes the fast fourier transform algorithm, followed by the energy-based scoring from pydock to calculate the normalized interface propensity (nip) values derived from rigid-body protein docking simulations to predict the ppifs and hotspots residues without any prior structural knowledge of the complex. then we used md to describe the possible fluctuations of the interacting proteins in order to suggest transient pockets that could be useful as targets of small molecules for the modulation of ppis. finally, we evaluated by ligand docking, the validity of predicted hotspots and pockets for in silico drug design. we found that the nip-based method from pydock protein-protein docking identifies hotspots residues that are located within the binding site of known inhibitors of ppis. predicting ppifs from a three dimensional structure is a key task for the modulation of ppis. the use of the nip-based hotspots prediction method improve the identification of transient cavities from md simulation when compared to known binding cavities. this approach can be extremely useful in a realistic scenario of drug discovery targeting ppifs, when there is no information at all about the protein-protein complex structure. protein complexes are the fundamental molecular organizations that assemble multiple proteins to achieve various biological processes. identification of protein complex membership should provide a genotype-phenotype map to elucidate human gene-disease associations. it has been routinely assumed that network clusters with dense connections inside and sparse connections outside would form functional protein complexes. therefore, searching highly modular subgraphs in protein-protein interaction networks was explicitly or implicitly implemented in the algorithms to find protein complexes. however, to our surprise, we found a large portion of complexes with a medium-to-low modularity from the analysis of experimentally confirmed protein complexes. we also discovered that these complexes have cellular functions enriched in highly time-and space-dependent expression, such as signal transduction or subcellular localization. we further developed an algorithm to find such complexes by weighing network connections to capture transient interactions with intrinsically disordered regions. we confirmed that our method improved the identification of biologically relevant members of protein complexes and covered more complexes with a medium-to-low modularity. furthermore, newly discovered subunits in protein complexes could explain more disease-gene associations, indicating its utility to expand current genotype-phenotype map of human diseases. expanding template-based protein-protein complex prediction using ab-initio docking sergio mares-s amano , luis angel rodr ıguez-lumbreras , juan fern andez-recio structural characterization of protein-protein interaction (ppi) networks is crucial for understanding the underlying molecular mechanisms whereby life processes and disease arise. however, due to inherent limitations of experimental techniques, such characterization only covers an extremely reduced fraction of the human ppi network (interactome). recent studies have shown that although available structural templates may suffice to model a significant proportion of the interactome, model accuracy and binding specificity remain unsolved problems. consequently, improving the ability to predict ppis structurally will help to provide a better d profile of the known interactome, which may ultimately lead to the development of new therapeutic applications. here we show a novel approach that combines templatebased modeling with protein-protein computational docking to the structure-based prediction of ppis. our approach samples different protein-protein structural models derived from docking simulations. models are subsequently ranked using a function that incorporates an energy-based scoring term and a structural template similarity score. the energy-based scoring function includes electrostatics, van de waals and desolvation calculations, whilst the template similarity score accounts for the degree of structural similarity of models against a high-resolution and diverse dataset of structural templates. our approach highly improved the predictive success rate over individual ab-initio docking and templatebased techniques across a large benchmark dataset, including protein-protein complexes. when compared to the performance of the ab-initio docking algorithm, we found that the approach increased consistently the success rate, by approximately %, for the top , top and top solutions. the success rate improvement was even more notorious when the comparison was performed against the predictions from the traditional template-based docking. though incorporating ab-initio docking expands considerably the scope of the template-based docking method, challenges remain for interacting proteins in which high conformational changes occur upon binding and also the size and diversity of the repertoire of structural templates needs to be increased. is essential for the development of multicellular organisms. in mammalian cells, early events in pcd involve the release of cytochrome c (cc) from mitochondria to the cytoplasm, so letting cc play a key role in assembling the apoptosome and triggering apoptosis. in plants, pcd is part of a general process -the so-called hypersensitive response -in which mitochondrial cc is likewise released into the cytosol but its further role and cytoplasmic partners remain veiled. such a coincidence in cc release made us think of a common link for pcd in such evolutionarily distant species along evolution. to go deeper in understanding the pcd-dependent role of cc, a proteomic approach based on affinity chromatography with cc as bait was run using human and plant cell extracts. upon combining this approach with bimolecular fluorescence complementation (bifc), a total of eight and nine unknown proteins interacting with cc under pcd conditions were identified in human and plant cells, respectively [ , ] . such novel cc-partners -which are located in the cytoplasm and even in the nucleus -are involved in protein folding, translational regulation, oxidative stress, dna damage, energetic and mrna metabolism [ ] . strikingly, some of the novel human cc-partners are closely related to those for plant cc, so indicating that the evolutionarily well-conserved event of cc release from mitochondria could involve a common signalosome consisting of a wide range of common targets [ ] . to also understand such a promiscuity of cc from a structural point of view, the cc surface residues involved in complex formation with each one of its counterparts were mapped by using nmr spectroscopy. the resulting data shows that the heme crevice of cc is at the cc-partner interface in most of the complexes, which is in agreement with the vast majority of known redox adducts of cc. in contrast, however, to the high turnover number of the redox cc adducts inside the mitochondria, the complexes formed by cc under pcd conditions lead to the formation of rather stable nucleo-cytoplasmic ensembles. altogether, these findings suggest that extra-mitochondrial cc interacts with nuclear and/or cytoplasmic pro-survival, anti-apoptotic proteins in both humans and plants so as to lead living cells to dye. keywords: cytochrome c, programmed cell death, signalosome. post-translational phosphorylation often modulates the function of proteins. in particular, they affect the role that cytochrome c (cc) plays in cell life and death [ ] . cc is phosphorylated in vivo in tyr and tyr residues [ , ] , but recently, two new phosphorylation sites have been described at positions and [ ] . hence, we aim at understanding the structural and functional changes induced by thr and ser phosphorylation cc. for this purpose, we designed two phosphomimetic mutants of cc by replacing either thr or ser by the canonical amino acid aspartic acid (t d and s d). as control, two other mutants at the same two positions (t a and s a) were analyzed so as to differentiate the effects due to the presence of a negatively charged residue. remarkably, the s a mutant is significantly less stable than the wild-type species. we found that phosphorylation at position thr diminishes the redox potential and oxygen consumption. in addition, t d mutation affects the ability of cc to bind the distal site pcc , thereby suggesting that phosphorylation at this position affects the electron carrier capacity of cc. mass spectrometry (ms) is widely used techniques to gain knowledge about biomolecules [ , ] . it produces a high amount of data which is often presented as a list containing thousands of proteins. that list usually contains few hits interesting for our research. the pocess to select those proteins may include integrating experimental with annotation data. it requires spending some time in both, performing calculus and searching in databases. in this poster we present msbiodata analysis tool, a web service thought to deal with this tedious work. with this tool, researchers can set rules to select the most interesting hits in his lists using both, experimental data and gene ontology [ ] annotation. the data can be upload to the web using an excel spreadsheet or a flat files in a mztab format, and rules are easily constructed by means logical sentences. those sentences are composed by one or more terms linked by logic operators (and and or). each term in the logical sentence indicates to our program the conditions that selected hits must meet. once the alysis is finished, the results are delivered by email. msbiodat analysis tool do not requires any programming knowledge to be used and is freely available at: http://msbiodata.innomol.eu keywords bioinformatics/data analysis/proteomics/data mining/ mass spectrometry. beside the rate of protein synthesis, the regulation of protein degradation plays a crucial role in the white muscle protein accumulation and overall fish growth. intracellular proteolysis in salmonid species, such as atlantic salmon, salmo salar l. and rainbow trout, oncorhynchus mykiss walb., was studied to evaluate the basic mechanisms of protein degradation that could possess a potential target to regulate the body mass accumulation in farmed fish. a number of white muscle proteases such as cathepsins b, l, and d, proteasomes, and calcium-dependent proteases (m-and m-calpains), was studied in the juvenile specimens of different size-and age-groups both wild and farmed salmonids. the correlations between the protease activity and expression levels and morphometric characteristics of fish were found. the size-and age-related differences in intracellular protease activity revealed in fish muscles indicate both general role of proteolysis regulation in salmonid growth and the specific role of the individual proteolytic enzymes as well. the data on negative correlation of cathepsin d and calpain activity in muscles and the rate of weight increase in juvenile salmonids were obtained. a revealed positive correlation of cathepsin b activity and morphometric parameters in fish young presumably indicates its primary contribution to non-myofibrillar protein turnover. ubiquitin-proteasome system seems to contribute to background protein turnover as the proteasome activity was not corresponded with growth rate. summarizing the data obtained the autophagy-lysosomal and calpain-related protein degradation pathways were recognized to be directly involved in body growth and muscle protein retention in salmonid fish. the work was carried out using technical facilities of ib karrc ras equipment centre and financially supported by the russian science foundation, grant no. - - "salmonids of the north-west russia: ecological and biochemical mechanisms of early development". solving the proteomic organization of fitness-related genes in uropathogenic escherichia coli in life threatening sepsis. nowadays, complete genomes for almost all major bacterial pathogens are available, helping researchers to identify virulence factors. however we still ignore how these genes are organized at the proteome level and how this association influences bacteria pathogenicity. we integrated available databases on upec e. coli (strain cft ) to investigate the genomic and proteomic organization of genes related to upec fitness in the host. intriguingly, we found that most fitnessrelated genes have orthologs not only in other pathogenic strains but also in non-pathogenic bacteria such as e. coli k- . these genes are organized in clusters and operons with similar structure. by integrating protein-protein interaction data we observed that genes with high impact on fitness also display a highly clustered organization when compared to other genes. overall, our results show that proteinprotein interaction clusters associated to upec fitness in the host represent a promising target for the design of new antibiotics. elucidating the molecular mechanisms by which the hnh endonuclease gp activates the terminases in bacteriophage hk ( ) . hnh endonucleases are characterized by two highly conserved his residues and an asn residue( ). gp is essential for phage head morphogenesis, likely because gp enhances the activity of the hk terminase enzymes toward the cos site ( ) . notably, enhancement of the terminase-mediated cleavage of the phage cos site requires the presence of an intact hnh motif in gp . mutation of the canonical metal binding his in the hnh motif abrogates gp mediated-terminase activity. although phages are widely studied, there is no definitive structural or mechanistic evidence as to how the hnh endonuclease within gp functionally interacts with the adjacent terminase enzymes to facilitate phage morphogenesis. previous work on hnhcontaining bacteriophage proteins does not address explicitly how the requirement for divalent metal binding at the hnh endonuclease site induces interaction with the terminase enzymes that are so crucial for phage dna packaging during morphogenesis ( , ) . in addition, gp possesses no sequence similarity to hnh proteins for which the structure has been determined ( ), making structural studies of gp necessary. toward these ends, we use nuclear magnetic resonance (nmr) spectroscopy to probe metal and terminase binding of gp in the wild type state and bearing metal binding mutations. we also report backbone resonance assignment of gp . our nmr studies have elucidated residues within gp required for metal binding and terminase activity. these data are being used to assess the role of specific gp residues in phage morphogenesis. together, this work will identify the enigmatic role describing how metal binding in hnh endonucleases is crucial in the replication and morphogenesis of phages. meat production from pigs for human consumption is a resource heavy process, indeed every part of the animal that is not used constitutes a protein food-chain loss, which is neither economically nor environmentally viable. the goal of this project is to better harness slaughterhouse waste such as the keratin rich pig bristles and nails through microbial conversion. instead of using identified single microorganisms, it is the goal to define microbial consortia where microorganisms synergistically show the ability of efficient keratin degradation/conversion. candidate consortia have been obtained by selecting for microorganisms growing on enriched media that contains milled pig bristles as sole carbon and nitrogen source. by using mass spectrometry and various biochemical analyses to investigate keratinolytic enzymes, methods will be established for identifying and characterizing suitable consortia. protein families likely to be involved are keratinases, which are specialized proteases including serine, cysteine and metallo proteases, as well as systems capable of reducing or otherwise breaking disulfide bonds which are highly abundant in hair and nails. furthermore, interactions and symbiosis of microorganisms in a consortium will be investigated at the meta-proteomics level. the project will lead to development of biotechnological degradation of keratin rich fibers, and provide new insights into functional dynamics and efficacy of microbial consortia. a comprehensive protein domain analysis to map cancer-type-specific somatic mutations interpretation of the genome-wide association studies (gwas) of cancer patients to find cancer-typespecific biomarker is challenging due to the mutational heterogeneity of cancer types. network approaches to find cancer-type-specific variants and biological pathways are increasing since genes tend to act together to display phenotypic or disease outcomes. phenotype similarity has proven to reflect the relationship of functionally related genes. we applied phenotype similarities between various diseases for expanding molecular connections of cancer-type-specific variants to discover cancer-type-specific modules. specifically, cancer-type-specific variants of cancer types from the cancer genome atlas (tcga) were analyzed to find phenotype-inferred relationships among the variants. we find that cancer variants that cause the similar disease phenotypes tend to be linked as a cluster of biological pathways or functions. moreover, cancer-type-specific modules could explain the underlying pathogenicity of specific symptoms which manifest in particular cancer types. cancer-type-specific modules and pathways found from phenotype similarity/dissimilarity based on cancer symptoms improved the discrimination performance to sort cancer-type-specific variants to accurately predict patient groups. our method will be further developed to find genetic biomarkers for the diagnosis or prognosis of specific cancer types pk- engineering a stable, symmetric membrane protein scaffold amanda duran , jens meiler computational protein engineering has the potential to contribute to various fields including drug design, protein therapeutics, and materials science. protein-ligand interface design and the construction of large, stable proteins rely on stable scaffolds. symmetry is a great tool for protein stability both in protein engineering and nature. several membrane protein structures exhibit pseudo-symmetry and are proposed to be the result of gene duplication, fusion and diversification events originating from a monomeric gene. aquaporins (aqp) are a class of membrane proteins that exhibits a two-fold inverted pseudo-symmetry. the escherichia coli aqp glycerol facilitator protein (glpf) was originally computationally engineered to be perfectly symmetric in sequence and presumably in structure. the symmetric gene was assembled, cloned, and expressed. however, after facing many challenges experimentally, the computational study has been expanded to aqps of known structure for a more extensive symmetric backbone search. mammoth structural alignment was used to align the structures to their inverted counterparts. cutpoints were calculated based on a-carbon distance. finally, the rosetta protein modeling software suite was used to refine and energetically minimize the symmetric backbones. from over generated symmetric backbones, candidates were chosen for experimental verification. these studies are ongoing.currently, the symmetric backbone models have scored to be more stable than the wild-type proteins. experimental verification of these symmetric backbones will provide valuable information for the current state of membrane protein modeling and design using computational methods. intrinsically disordered proteins drive heritable transformations of biological traits daniel jarosz , james byers , sohini chakrabortee , sandra jones , amelia chang , david garcia stanford university, whitehead institute for biomedical research, rockefeller university the transmission of information from one generation to the next generally occurs via nucleic acids. the only known protein-based molecular memories are prions, which drive heritable biological traits based upon self-templating changes in protein conformation. these protein-based genetic elements have previously been identified systematically, but at least three do not share the sequence biases or structural characteristics that have informed such studies. here we employed a comprehensive library of yeast proteins to examine the breadth of protein-based inheritance. transient overexpression of more than forty proteins created new traits that were heritable and beneficial. some shared properties of known prions, but most employed distinct genetic and biochemical mechanisms to act as elements of inheritance. traits with these characteristics were common in wild yeast strains and could also be elicited using orthologous mammalian proteins. the inducing proteins were strikingly enriched in intrinsically disordered sequences that have been widely conserved across evolution. intrinsically disordered proteins are associated with human disease and with dosage sensitivity in yeast, flies and worms. our results suggest another widespread role for such intrinsically disordered sequences: induction of heritable epigenetic switches that transform phenotypic landscapes and drive adaptation to stressful environments. prediction of binding affinity in protein complexes: contacts do matters almost all critical functions in cells rely on specific protein-protein interactions. understanding these is therefore crucial in the investigation of biological systems. despite all past efforts, we still lack a thorough understanding of the energetics of association of proteins. here, we introduce a new and simple approach to predict binding affinity based on functional and structural features of the biological system, namely the network of interfacial contacts. we assess its performance against a protein-protein binding affinity benchmark and show that both experimental methods used for affinity measurements and conformational changes have a strong impact on prediction accuracy. using a subset of complexes with reliable experimental binding affinities and combining our contacts-and contact types-based model with recent observations on the role of the non-interacting surface in protein-protein interactions, we reach a high prediction accuracy for such a diverse dataset outperforming all other tested methods. free radical oxidation -a new method for obtaining stable protein coatings on magnetic nanoparticles magnetically targeted nanosystems (mtnss) are now considered to be applicable in different areas of biology and medicine such as hyperthermia, magnetic resonance imaging, immunoassay, cell and molecular separation, a smart delivery of drugs to target cells. proteins are promising materials for creation of coatings on magnetic nanoparticles (mnps) due to their biocompatibility, an ability to protect magnetic cores from influence of biological liquids and prevent agglomeration of mtnss in dispersion, their possible functional activity as therapeutic products and biovectors. the creation of stable protein coatings with retention of native properties of molecules is still an important biomedical problem because of disadvantages of the commonly used methods such as formation of a polydisperse ensemble of particles, nonselective linking of proteins leading to cross-linking of macromolecules in solution, and desorption of coatings. a novel method in obtaining stable single-layer coatings assembled from protein molecules on the surface of magnetite nanoparticles has been developed. it is based on protein liability to free radical modification, leading to the formation of intermolecular covalent cross links. free radicals are locally generated on the surface of nanoparticles via the fenton reaction thereby proteins adsorbed on the surface are subjected to the cross-linking. o-phenylenediamine was used for detection of free radical generation initiated by nanoparticles. the proteins drastically differing in their structure and properties, namely, serum albumin, thrombin and immunoglobulin g were selected for creating the protein coatings. the properties of the obtained coatings and their stability have been studied with the help of dynamic light scattering (dls), uv/vis spectrophotometry, antibody-antigen test and the method of spectral-fluorescent probes. albumin molecules in mnps coatings have been shown to retain their capability of binding with a dye and be conformationally stable. the dye , '-di-(g-sulfopropyl) , 'diphenyl- -ethiloxacarbocyanine-betaine interacting with albumin with a growth of fluorescence and with partial cis-trans conversion of the dye has been used. it has been proven that coatings composed of protein macromolecules are ) stable, ) formed around individual nanoparticles and ) have several nanometers in thickness. the free radical linking of thrombin and immunoglobulin g on the surface of nanoparticles has been shown to almost completely keep native properties of the protein molecules. the free radical linking method reveals new possibilities for design of single-layer multiprotein polyfunctional coatings on the surfaces of all the nano-, micro-and macroobjects containing metals of variable valence (for example, fe, cu, cr). the spectral-fluorescent investigation was supported by the russian foundation for basic research, project nos. - - and - - mol_a. regulation of neuronal snares by accessory proteins shrutee jakhanwal , reinhard jahn regulation of neuronal snares by accessory proteins shrutee jakhanwal and reinhard jahn department of neurobiology, max planck institute of biophysical chemistry, fassberg, goettingen, germany- . synaptic vesicle exocytosis lies at the heart of the process of neurotransmitter release. and, the family of proteins that is central to the process of synaptic vesicle exocytosis is the family of snare proteins. there are three kind of neuronal snare proteins namely syntaxin, snap and synaptobrevin. these three snare proteins interact through their snare-motifs to form a highly stable four-helix bundle, which in turn, pulls two membranes together to mediate fusion. years of work in this field have established that the four-helix bundle is critical for the membrane fusion to occur. however, the process of regulation of snare-mediated fusion remains very poorly understood. the major regulatory proteins involved in the process are munc , munc , synaptotagmin and complexin. the major aim of my project is to obtain a closer look at the regulation process of snare-mediated fusion by focusing on the interaction between the snare proteins and the regulatory proteins. to achieve this objective, i express and purify the different proteins involved in the process of snare-mediated fusion and thereafter subject them to appropriate biochemical characterization. in order to assess the role of the purified proteins in the process of fusion, i reconstitute them into liposomes and perform in-vitro lipidmixing assays. these assays are based on f orster resonance energy transfer (fret). based on the discretion of assessing the protein-protein or protein-lipid interactions, either the proteins or the lipids can be fluorescently labeled. also, the lipid compositions can be varied in order to assess the effect of lipid on the function of the respective protein. fluorescence-based anisotropy measurements can also provide information about the degree of freedom of a protein, indirectly providing information about the kinetics of a reaction. employing these techniques, i observe that munc - leads to displacement of syntaxin from a complex of syntaxin and snap . also, a complex of syntaxin and munc is resistant to the action of the aaa-atpase, nsf and its co-factor asnap, implicating this complex as a strong candidate for acting as the starting point for the process of neurotransmitter release. munc also appears to enhance lipid-mixing by interacting with the snare-complex. further investigations on the same lines can provide very useful insights into the process and can help us unravel the secrets that underlie the beauty of the exquisitely regulated process of neurotransmitter release. binding of thymidine nucleotides to a viral thymidine monophosphate kinase aldo a. centro de investigaci on en alimentaci on y desarrollo theme: biochemistry there is great interest in the evolution and activities of fish trypsins, since they appear to have evolved into different families. the cdna for trypsin iii from the monterey sardine (sardinops sagax caerula) was obtained and its deduced amino acid sequence matched its identity with a purified protease from the fish by mass spectrometry analysis. molecular modeling of sardine trypsin iii compared to other homologs showed a typical trypsin fold with all the cognate components for catalysis, and specific amino acid distribution that are possible factors that explain the cold adaptation. from phylogenetic analysis, sardine trypsin iii belongs to the novel y family, which is proposed to have evolved for cold adaptation. the obtained recombinant trypsin iii showed a low catalytic efficiency, but it remained active at cold temperatures, similar to other cold-adapted trypsins. the cold-adaptation of sardine trypsin iii opens a wide range of biotechnological applications for this protease and is also interesting from the serine protease structure-function relationship point of view. fungicidal mechanism of scolopendin , a cationic antimicrobial peptide from centipede heejeong lee , dong gun lee drastically (from . x - to x - colonies) upon deletion of this residues domain from the full length trai. we are investigating the structure and function of this very c-terminal end of trai using nmr spectroscopy. for the backbone assignment we used slice-selectively homonuclear broadband decoupled spectra along with standard experiments. three-bond scalar coupling constants were obtained through real-time j-upscaling experiments. with the backbone assignments, we have the first hand evidence which shows that his domain is for the most part intrinsically disordered, but contains short a-helical regions. structural development, interaction studies to find the binding partner and transition of disorder to order orientation of this domain will be further investigated in this project. here we investigated a model system where mab aggregation is induced by increasing the ionic strength (nacl) at low ph. the aggregation depends both on protein and sodium chloride concentration. with nanoparticle tracking analysis (nta) and micro flow imaging (mfi) the aggregation formation was further characterized. aggregation can be partially reverted by lowering the ionic strength as determined by soluble monomer concentration measurement using se-hplc: parts of insoluble aggregates could be solubilized as soluble aggregates, dimers or even monomers. a quasi equilibrium is formed in between the subtypes. the whole aggregation process was examined by ftir and cd-spectroscopy to identify structural changes of the mab. screen of protective additives: the effect of osmolyte additives on aggregation kinetics and final aggregate concentration is investigated, revealing protective effects in both cases. in a screen with more than compounds not only the aggregation propensity was studied but also structural changes. the aggregation index (quantity for colloidal stability) and the melting point (quantity for conformational stability) measured by differential scanning fluorimetry were determined. the used mtp format screen has potential for buffer optimization and formulation development. structural biology and protein dynamics tetraspanin cd has a broad range of cellular functions, such as integrin association forming tetraspanin-enriched domains, synapse formation between b and t cells, cell adhesion, motility, invasion and signalling. furthermore, cd is one of the four receptors involved in the cell entry of hepatitis c virus (hcv) and therefore infection onset, one of the major causes for chronic liver disease resulting in cirrhosis and hepatocarcinoma. human cd large-extracellular-loop (hcd lel) is composed of a "stalk" and a "head" subdomain; with the latter interacting with hcv-e glycoprotein. we present four novel hcd lel crystal forms. analysis of the fourteen independent observed hcd lel high-resolution x-ray structures suggests that the dynamism of the hcd lel head-subdomain is an inherent molecular property, an observation supported also by molecular dynamics (md) studies. we classify the conformations in three distinct clusters (closed, intermediate and open) , which are seen both in the crystal structures and in the molecular dynamics simulations. the md simulations also show that conformational variability is modulated by ph changes, with distinct probability for each cluster at acidic and neutral ph. furthermore, in silico docking of the recent e core structure with three of the major types of hcd lel head-subdomain clusters highlights hydrophobic interactions as the major forces in the e core: hcd lel recognition mechanism. we propose that the flexibility of the hcd lel is exploited by hcv at different stages of cell entry from virus attachment to internalization and fusion with the endosomal membrane. our results provide important insights on the basic mechanism governing hcv binding to hcd , and can help structure-based drug design of entryinhibitors of hcv. allophycocyanin of gracilaria chilensis: from gene to function jorge dagnino-leone , jos e martinez-oyanedel , marta bunster-balocchi universidad de concepci on theme: structure-function relationship of proteins the phycobilisomes (pbs) are auxiliary photosynthetic complexes that allow cyanobacteria and red algae to enhance the energy uptake in the range of - nm. in gracilaria chilensis, an eukaryotic red algae, pbs is composed of phycoerythrin (pe), phycocyanin (pc) and allophycocyanin (apc); these proteins possess chromophores which capture energy and then transfers it to photosytems. pbps are oligomers of a ab heterodimer; it oligomerizes into a trimer (ab) , this trimer has discoidal shape and it is associated in hexamers (ab) , several of this hexamers forms cylinder-like structures. pbs has components: antennas and core. the antennas are composed of pe and pc, whose function is to capture energy between - and - nm respectively and transfer it to the core. the core is formed by apc, which can absorb energy in the - nm range. apc emission allows transferring energy to the photosystems with high efficiency. pbs is also composed by linker proteins which allow the correct assembly of pbs and possibly regulate the energy transfer. the main goal in our group is to build an atomic model of the gracilaria chilensis phycobilisome. we have solved the crystal structure of pe and pc and created an antenna model. at present we are working in apc and the chromophorilated linker proteins. the objective of the present work is to create a model of the core of gracilaria chilensis; to achieve these we have used molecular biology, biochemistry and bioinformatics techniques. we designed oligonucleotides primers for the four allophycocyanin subunits genes and for the globular domain of the apce linker. these primers were used in pcr experiments to obtain the genes sequences. the sequences were translated to a aminoacid sequences and used to build a d model for apc subunits and trimers using the software modeller. on the other hand we purified and analyzed the spectroscopic properties of apc from gracilaria chilensis using absorption and fluorescence spectroscopy. we also determined apc oligomerization state using gel filtration. molecular docking using the cluspro server was performed to obtain a hexamer and apc cylinder models. based on electron micrographs obtained by our lab a tri-cylindric core model was built. all the models were submitted to a molecular dynamics using gromacs software. finally we determine possible energy transfer pathways in the core model applying the extended forster equation, spectroscopic data from literature and the transition dipole moments of each of the chromophores present in the core. as conclusion of this work we built the first atomic model of gracilaria chilensis phycobilisome core and propose energy transfers pathways inside the core in the context of a phycobilisome. novel practical strategies to access artificial metalloenzymes marco filice , jose miguel palomo departamento de biocat alisis, instituto de cat alisis, csic protein chemistry and engineering since the first report, the design of artificial metalloenzymes has rapidly been converted into an important topic in biological and inorganic chemistry due to their potential applications in synthetic chemistry, nanoscience and biotechnology. the combination of a catalytically active organometallic moiety with a macromolecular host has permitted the creation of biohybrids, a new kind of heterogeneous catalytic entities combining the attractive features of both homogeneous and enzymatic systems. presenting our most recent achievements in this research area, here we describe two novel powerful and promising approaches focusing the practical synthesis and large scale production of heterogeneous artificial metalloenzymes showing chimeric activity. the first strategy is based on the in situ synthesis of noble metal nanoparticles and their supramolecular assembly with a microbial lipase from candida antarctica (fraction b) finally creating an ultra-active organometallic-enzyme heterogeneous nanobiohybrid. in the second approach, combining different protein engineering protocols (molecular biology, orienting immobilization, solid-phase bioorganic modification and bioinformatic tools), an orthogonal solid-phase strategy creating novel unnatural catalytic sites was designed and optimized. the application of such a strategy onto the structure of the lipase from geobacillus thermocatelunatus permitted the generation of a heterogeneous artificial metallolipase with chimeric activity. as proof-of-concept, the combinatorial library of generated artificial metalloenzymes obtained by both strategies was successfully assessed in a set of different synthetic reactions (selective c-c bond formation as suzuki, heck or diels-alder reactions) and also combining both activities (metallic and enzymatic) in cascade processes such as dynamic kinetic resolution of amines or production of arylamines. the obtained results were excellent in all cases. extending this strategy to other enzymes, proteins and catalytic metals, we envisage the creation of a combinatorial library of programmable artificial enzymes useful for a wide set of applications (i.e. fine organic and medicinal chemistry, bioremediation or biomedicine). proteomic examination of the yeast nuclear pore complex dynamics protein turnover and exchange nuclear pore complexes (npcs) are proteinaceous assemblies situated in nuclear envelopes of eukaryotic cells. the main function of the npc is the selective transport of macromolecules. npcs also partake in other functions, such as nuclear organization and gene regulation. the core scaffold of the npc is thought to be a stable structure, while the peripheral components exchange at various rates. however, these phenomena have not been elucidated in detail. the recent findings that yeast daughter cells get a higher proportion of the old npcs and the core scaffold hardly turns over raise the possibility that the exchange of the peripheral nucleoporins can be a repair mechanism. yeast provides a useful organism for the interrogation of nucleoporin exchange, as it performs closed mitosis; hence the only mixing of npc constituents is due to exchange. we have developed a panel of genetic tools providing for conditional induction and repression of nucleoporins. by combining these switches with stable isotope metabolic labeling and affinity capture, cross linking coupled to mass spectrometry, we are able to distinguish between pre-existing and newly synthesized proteins and quantify their relative amounts in the npc. our preliminary findings are in agreement with results obtained in other organisms: the core scaffold of the npc (inner ring, outer ring) appears to be stable, however does exchange slowly over time, while peripheral components exchange faster. by looking at the exchange rates of yeast nucleoporins we hope to gain insight into the npc biology of actively dividing eukaryotic cells. active site clustering identifies functional families of the peroxiredoxin superfamily angela harper , janelle leuthaeuser , patricia babbitt , jacquelyn fetrow department of physics, wake forest university, department of molecular genetics and genomics,-wake forest university, departments of physics and computer science, wake forest university bioinformatics understanding the relationships between proteins is vital to increasing our knowledge of the protein universe. while there are large databases of sequence information, the massive data influx over the past decade has prevented adequate classification of proteins at the molecular function level. however, it has been previously suggested that a protein's active site information may correlate with these known molecular functional differences; thus, active site profiling was developed to use residues around the active site of a protein to relate proteins. subsequently the deacon active site profiler (dasp) was developed to create these active site profiles and search them in a database, such as gen-bank, in order to find proteins with similar active site environments. by using dasp to computationally cluster proteins based on the similarity of their active site profiles, the peroxiredoxin (prx) superfamily was analyzed through active site similarity methods. the residues from the active site of each prx structure were extracted and clustered, and these profiles were iteratively searched in genbank through a multi-level iterative sequence searching technique (misst). the prx superfamily has been studied by experts, allowing the results of these searches to be compared to a well-annotated group of proteins. while previous sequence based evolutionary methods have been unable to identify functional differences between some subgroups of the prxs, notably the ahpc-prx and prx subgroups, misst discretely separates these subgroups. classifying prx proteins into functionally relevant groups using computational active site similarity methods lays the foundation for an automated process for identifying protein functional groups beyond the prx superfamily. synthesis and conformational studies of glycoprotein n homolog of bovine herpesvirus (bhv- ) by using cd, nmr and molecular modelling it serves as a chaperone for viral glycoprotein m and, in its gm-unbound form, acts as an inhibitor constraining the transporter associated with antigen processing (tap). the ul . /gm complex formation is required for the maturation and proper trafficking of both viral proteins. in the absence of gm, ul . blocks transport of antigenic peptides by tap and their mhc i-restricted presentation. the molecular mechanism of ul . activity still remains elusive. in order to investigate the structural requirements for biological function ul . study was conducted using cd, nmr and molecular dynamics methods. the data obtained with the use of high purity synthetic peptides encompassing ul . confirmed the presence of an alpha-helix structure, formed preferentially in the presence of dodecylphosphocholine (dpc) micelles as a membrane-like environment. in order to determine the three-dimensional structure of ul . protein in the present work its nmr solution structure in the presence of membrane-like environment was performed. the nmr data were used as a set of restraints for a simulated annealing protocol that generated dstructures of the colin johnson , sara codding membrane proteins resealing of tears in the sarcolemma of myofibers is a necessary step in the repair of muscle tissue. defects in this repair process are responsible for muscular dystrophy and cardiomyopathy. the repair pathway is triggered by the influx of calcium through lesions in the membrane, which result in membrane fusion and patching of the wound. recently dysferlin has been identified as a calcium binding protein essential for sarcolemma repair, as well as other snare mediated exocytotic events including cytokine and acid sphingomyelinase secretion. in this presentation we demonstrate a direct interaction between dysferlin and the snare proteins syntaxin and snap- . in addition, fret and in vitro reconstituted lipid mixing assays indicate that dysferlin accelerates snare heterodimer formation and snare mediated lipid mixing in a calcium sensitive manner. our results suggest a model whereby dysferlin acts as a calcium sensing snare effector for exocytosis and membrane fusion. exploring the therapeutic potential of a peptide derived from a poxviral immune evasion protein: nmr determination of the solution structure of viper and its inactive mutant toll-like receptors (tlrs) have a role in viral detection leading to cytokine and ifn induction, and as such they are targeted by viruses for immune evasion. the poxviral protein a has been identified to inhibit tlr signaling by interacting with tir domain-containing proteins of the receptor complex to collectively inhibit all tlr adaptor proteins that positively regulate transcription-factor activation ( ). one aa peptide (kysf-klilaey) termed viper (viral inhibitory peptide of tlr ) was reported to retain the inhibitory properties of full length a against tlr signaling. a r homopolymer delivery sequence at the c-terminus provided delivery of the peptide into cells. structural comparisons are presented between r-viper, which is active in preventing tlr -dependent cytokine induction in cell culture, and a mutant that exhibited loss of function ( r-viper l a,e a), through solution nmr spectroscopy. we find that despite a relatively minor sequence difference, the loss of hydrophobicity as well as negative electrostatic interactions result in subtle but potentially significant differences in the region of the peptide proposed to interface with tlr . reference: wake forest university, wake forest university, university of california san francisco protein function prediction the elucidation of protein molecular function lags far behind the rate of highthroughput sequencing technology; thus, it is essential to develop accurate and efficient computational methods to define functional relationships. protein clustering based on sequence similarity has emerged as a simple, high-throughput method for defining protein relationships, but sequence-based techniques often inaccurately define molecular function details. active site profiling (asp) was previously developed to identify and compare molecular details of protein functional sites. protein similarity networks were created using both active site similarity and sequence similarity for four manually curated superfamilies, and results demonstrate that asp-based clustering identifies detailed functional relationships more accurately than sequence-based clustering. building on this, two iterative pipelines were developed using active site profiling and profile-based searches to cluster protein superfamilies into functional groups. first, the two level iterative clustering process (tulip) utilizes active site profiling and iterative pdb searches to divisively cluster protein structures into groups that share functional site features. across eight superfamilies, tulip clusters exhibit high correlation with expert functional annotations. subsequently, the multi-level iterative sequence searching technique (misst) utilizes iterative profile-based genbank searches to identify protein sequences that belong in each tulip group. the results indicate that these asp-based methods accurately and efficiently identify functionally relevant groups through a process that can be applied systematically and on a large-scale. moreover, the approach can be applied more quickly than detailed manual curation, suggesting its value in guiding annotation efforts. dept. biochemistry and molecular biology. university of valencia, lab of peptide and protein chemistry. centro de investigaci on pr ıncipe felipe membrane proteins changes in the equilibrium between pro-survival and pro-apoptotic members of the b-cell lymphoma- (bcl- ) protein family at the mitochondrial outer membrane (mom) induce structural changes that committed cells to apoptosis. bcl- homology- (bh )-only proteins participate in this process activating pro-apoptotic effectors and promoting permeabilization of the mom. the membrane association of bh -only proteins is a controversial issue due to the lack of a canonical carboxyl-terminal (c-terminal) transmembrane (tm) domain. we used an in vitro transcription/translation system to study the insertion capacity of these hydrophobic c-terminal regions of the bh -members bik, bim, noxa, puma and bmf into microsomal membranes, and an escherichia coli complementation assay to validate our results in bacterial cells. furthermore, we have fused these hydrophobic regions to gfp to investigate the subcellular sorting. these results will allow further refinement in the elaboration of the bcl- protein-protein and protein-membrane interactome network. alexis peña , flaviyan jerome irudayanathan , shikha nangia syracuse university, dept. of biomedical and chemical engineering computational modeling, biostatistics, biomedical and chemical engineering tight junctions (tj) are vital intracellular barriers that are responsible for regulating paracellular transport. claudins, a family of abstract small transmembrane proteins with approximately members, are an integral part of the tj strands. tight junctions provide molecular-level protection and prevent infection and toxins from entering the body; in the same sense tjs allow nutrients and vital solutes to pass through. claudins are associated with various diseases including metastatic cancer as well as an entry point for many viruses. despite their importance and abundance in all cell membranes and their ubiquitous nature, the exact -d structure of claudins has remained elusive to traditional x-ray crystallographic and nmr studies. in this investigation, a computational approach was used to determine the claudin structure of claudin - . homology modeling, molecular dynamic simulations, and reverse mapping were employed to predict the protein structures with relative accuracy. understanding structure of claudin proteins and its interaction at the molecular level can lead to effective drug delivery technology. determination of optimal conditions for an isothermal titration calorimetry essay to obtain kinetic parameters of trypsin i from pyloric caeca of monterey sardine (sardinops sagax caerulea) idania emedith quintero reyes , francisco javier castillo y añez , enrique fernando vel azquez contreras , roc ıo sugich miranda , david octavio corona mart ınez , aldo alejandro arvizu flores , ivet cervantes dom ınguez protein kinetics determination of optimal conditions for an isothermal titration calorimetry essay to obtain kinetic parameters of trypsin i from pyloric caeca of monterey sardine (sardinops sagax caerulea) trypsin is the most studied alkaline protease and it s very common to found isoforms from this protein as the case for monterey sardine (sardinops sagax caerulea); as it shows an expression of trypsin i and trypsin iii according to the cdna characterization. trypsin i was determine to be a cold adapted enzyme as it shows a higher catalytic efficiency (kcat/km) than the mesophilic counterparts. the kinetic parameters were obtained by spectrophotometric essays, which are not fallible for all the enzymes because native, recombinant or mutant enzyme activity could be below the detection limit of the assay, opaque or turbid solutions interfere with spectrophotometric detection, etc. alternative tools as the isothermal titration calorimetry (itc) can measure enzyme kinetics using thermal power generated by the enzymatic conversion of substrate to product; were the rate of reaction is directly proportional to thermal power. the objective of this study was to stablish the optimum conditions to obtain kinetic parameters of trypsin i from pyloric caeca of monterey sardine using itc. to reach the objective trypsin i was purified from viscera of monterey sardine using molecular exclusion and affinity chromatography obtaining a yield of . mg/ml. at c kcat and km of tryipsin i form monterey sardine were . s- and . mm respectively. at c were . s- and mm (kcat and km) and at c kcat was . s- and km . mm. the kinetic parameters obtained by spectrophotometric assay at c were kcat and km s- and . mm respectively. at c the kcat was . s- and km . mm and at c kcat s- and km mm. comparing the values obtained for kcat with the spectrophotometric essay were higher fold than those obtained by itc and the values in km were similar by both methods. even though the differences in kcat, we can reassert the psychrophilic behavior of trypsin i as the catalytic efficiency is higher by both methodologies. in the understanding that the kinetic behavior of enzymes is important to not only understanding biochemical pathways and catalytic mechanisms but is again a fruitful area for drug discovery and development; so the itc provides a universal approach to determining the kinetic behavior of enzymes and can yield in a single experiment a complete set of kinetic parameters for an enzyme-catalyzed reaction that can be applied for the different alkaline proteases from pyloric caeca of monterey sardine (sardinops sagax caerulea). mysterious world of stress-responding sigma factors in bacillus subtilis olga ramaniuk protein-dna interaction bacterial transcription is mediated by the rna polymerase holoenzyme containing sigma factors -essential proteins for the initial step of transcription that recognize and bind to promoter dna. the primary sigma factor is essential in exponential phase of growth while alternative sigma factors are active during transcription under stress conditions. this project has three main aims. the first aim is to explore the binding properties of b. subtilis alternative sigma factors; specifically, whether sigma factors lacking the autoinhibitory domain . can bind to promoter dna in the absence of rnap. the second aim explores whether rnap associated with alternative sigma factors is regulated by the concentration of the initiation nucleoside triphosphate. the third aim is to define the regulon of sigma i. in order to achieve our aims, out of alternative sigma factors were successfully purified using affinity chromatography and ion exchange chromatography. we set up in vitro transcription system with selected sigma factors and initiated experiments with sigma i regulon determination. results named above and our future findings will help to better understand gene expression regulation on the level of transcription initiation. this work was supported by grant no. p - -g from the czech science foundation. assessing the costs and benefits of protein aggregation protein aggregation and cell fitness protein aggregation has been associated with numerous diseases but also with important cellular functions such as epigenetic inheritance. here we present a population genetics approach to infer the costs and benefits of protein aggregation on cell fitness. this information is crucial to understand how cellular systems tolerate the formation of protein deposits and which factors modulate this event. using our experimental system, we measured different protein aggregation effects (deleterious, neutral or beneficial) within the same genomic background. single cell analyses, within the same population, showed stochastic variability in the aggregate's size and in its effect on cell fitness. our data indicates that, in certain conditions, protein aggregation can enhance population variability and survival expectancy. overall, these results suggest that the presence and formation of protein aggregates could be almost harmless whereas the associated gain and loss of function are critical for the cell. revealing the key role of negatively charged residues of heme sensor proteins involved in geobacter sulfurreducens' signal transduction pathways marta a. silva , telma c. santos , teresa catarino , carlos a. salgueiro ucibio-requimte, departamento de qu ımica, fct-unl., instituto de tecnologia qu ımica e biol ogica, unl signal transduction proteins bacterial chemotaxis systems sense and regulate the microbe mobility in response to environmental conditions. such mechanisms constitute a striking example of cell motility to gain advantages for cell survival and permit the bacteria to fill important niches in a diversity of anaerobic environments [ ] . geobacter sulfurreducens (gs) is an anaerobic bacterium with a considerable respiratory versatility whose genome encodes for an unusual family of methyl-accepting chemotaxis proteins (mcp), each containing at least one heme c-binding motif [ ] . these sensor proteins, gsu and gsu , are involved in signal transduction pathways mediated by chemotaxis-like systems [ ] . the thermodynamic and kinetic characterization of the sensors gsu and gsu by visible spectroscopy and stopped-flow techniques, at several ph and ionic strength values revealed that sensor gsu midpoint reduction potentials are lower than those of gsu at all ph and ionic strength values and the same were observed for the reduction rate constants [ ] . the origin of the different functional properties of these closely related sensor domains are rationalized in the structural terms showing that gsu has two extra negatively charged residues in the vicinity of the heme group, which have no counterpart in gsu : glu and asp . residue asp is less exposed compared to glu and it was suggested that its carboxylic group might have a role in the modulation of the heme reduction potential of gsu . to investigate this, both residues were replaced by a positively charged amino acid (lysine) and by a neutral one (asparagine or glutamine). for the mutants with enough expression, a functional characterization was carry out, using several spectroscopic techniques, including uv-visible and cd, together with kinetics and potentiometric measurements. significant changes on the reduction potential values are observed when a negative charge is replaced by a positive one at position or . therefore, the decrease of the reduction potential in asp and glu mutants reinforces the hypothesis that the higher reduction potential observed for heme sensor domain gsu is related with the less negative electrostatic surface around the heme. this work provides, for the first time, evidence for the co-existence of two similar methyl-accepting chemotaxis proteins functioning in different working potential ranges. these proteins are responsible to allow geobacter sulfurreducens triggering an adequate cellular response in different anoxic subsurface environments. national autonomous university of mexico, faculty of medicine, national autonomous university of mexico, faculty of chemistry, national autonomous university of mexico, institute of chemistry molecular evolution the glycolytic enzyme triosephosphate isomerase (tim) is an oligomeric (b/alpha) barrel that catalyses the interconversion of d-glyceraldehyde -phosphate and dihydroxyacetone phosphate in a diffusion-limited reaction. although each subunit has its own active site, naturally occurring monomeric tims have not been reported; in fact, monomer association is very tight. tim topology is well conserved among the three domains of life. nevertheless, their folding mechanism and inhibition properties vary across species. comparative studies of proteins have proved to be very useful in understanding the relationship between sequence and physicochemical properties, however, they lack the capacity to give a more integrative and evolutive correlation. in order to elucidate how the catalytic properties, the oligomerization state and the stability of extant tims arose, in this work we examined the molecular history of eukaryotic tim through ancestral protein reconstruction methods (maximum likelihood) and the subsequent physicochemical characterization of the resurrected enzymes. we first characterized in detail the protein corresponding to the last common ancestor of animals and fungi (tim ). the cd and fluorescence spectra of tim are similar to those of extant tims. secondary structure is lost in a cooperative transition with tm . c. the enzyme loses activity upon dilution suggesting that only the dimer is active. dilution experiments followed by isothermal titration calorimetry indicate that dissociation enthalpy is small; moreover the heat capacity change observed is three times higher than the one predicted for a rigid body dissociation process, suggesting partial unfolding of the monomers. when compared with extant tims, the catalytic efficiency of tim is reduced -fold, whereas binding of pgh, a transition-state analogue, shows a similar thermodynamic signature. these data indicate that although monomer association may have been less tight in ancestral tims, catalysis has been always linked to oligomerization. analysis of the crystal structure of tim , obtained at . Å resolution, suggests that the lack of four salt bridges observed in the interface of extant tims is responsible for the low dimer stability. in order to test this hypothesis we also studied the stability of four younger reconstructed ancestors that acquired the salt bridges in two different phylogenetic lineages. we found a correlation between the appearance of stabilizing interactions in the interface, dimer stability and catalysis; suggesting that these salt bridges are partially responsible for extant dimer stability and shed light on the dimeric nature of extant tims. receptor protein-tyrosine phosphatases: dimerization, receptor kinase interaction and allosteric modulation elizabeth dembicer , damien thevenin department of chemistry, lehigh university theme: receptor tyrosine kinase and receptor protein phosphatase signaling many cell-signaling events are regulated through reversible tyrosine phosphorylation of proteins, which is controlled by the counterbalanced actions of two key enzyme families: protein tyrosine kinases and protein tyrosine phosphatases. interestingly, both families include transmembrane receptor-like enzymes, namely the receptor tyrosine kinases (rtks) and the receptor-like ptps (rptps). while the regulation and actions of many rtks are well characterized, the mechanisms controlling the enzymatic activity of rptps and how they interact with their substrates remain to be fully explained. thus, understanding how these receptors function and interact will give fundamental insights into how tyrosine phosphorylation is finely tuned in cells, and how it can be modulated. increasing evidence indicates that rptps, like rtks, are regulated by homodimerization. however, it appears that homodimerization inhibits the activity of most rptps. even though the transmembrane (tm) and the juxtamembrane domains have been proposed to be involved in this process, there is no clear structure-based proposal for the role of these regions. moreover, several rptps have been identified as candidate regulators of rtks. in particular, the receptor-type tyrosine-protein phosphatase eta (ptprj; also known as dep or cd ) is capable of attenuating egfr tyrosine phosphorylation. physical interactions of egfr with ptprj at the cell surface have been documented, but the basis for these interactions is unknown. here, using a dominant-negative transcriptional activator-based assay (dn-aratm), and mutagenesis analysis, we show that: ( ) ptprj has a strong tendency to homodimerize, ( ) ptprj heterodimerizes with egfr through tm-tm interactions, ( ) these interactions are mediated by specific residues, and can be modulated by the delivery of peptide binders. this work represents the first structure-function study of rptp-rtk interaction, and may not only result in significant progress towards a better understanding of the basic biology of rptps in cancer cells, but also offer new possibilities for targeting protein tyrosine phosphatases for therapeutic modulation of egfr in oncology. inhibiting egfr dimerization and signaling through targeted delivery of juxtamembrane domain peptide mimics using phlip anastasia thevenin , kelly burns , janessa guerre-chaley , damien thevenin regulating receptor tyrosine kinase signaling the elevated phosphorylation of key regulatory tyrosines on oncogenic signaling proteins that result from aberrant protein tyrosine kinases activity plays well-abstract established roles in promoting tumorigenesis and in the high frequency with which resistance arises to existing therapeutic treatment. for instance, this is the case for the epidermal growth factor receptor (egfr). thus, there is a clear need for novel specific targeting methods to inhibit the activity of receptor protein tyrosine kinases, such as egfr, in cancer. egfr becomes activated upon ligand binding to the extracellular domain, leading to receptor dimerization. the juxtamembrane (jm) domain of egfr is critical for intrinsic tyrosine kinase activity and receptor dimerization by stabilizing the active conformation of egrr through the formation of a antiparallel helical dimer. therefore, peptides mimicking the jm domain -if specifically delivered to cancer cells -have the potential to prevent egfr dimerization, receptor activation, downstream signaling, and thus to attenuate aberrant egfr activity in cancer cells. here, phlip (ph low insertion peptide), a peptide that can selectively target cancer cells and tumors based solely on their extracellular acidity, is used to selectively translocate the jm domain of egfr in cancer cells to prevent egfr dimerization. at ph above , phlip is soluble and unstructured, however, when exposed to lower ph such as observed in tumors, phlip inserts as a transmembrane (tm) alphahelix, allowing the direct translocation of cargo molecules into the cytoplasm. using the dominant negative arac-based transcriptional reported assay (dn-aratm), which assesses jm and tm domain interactions in cells membranes of e. coli, we show that phlip-jm is able to disrupt egfr dimer by %. current work is focused on testing the ability of such phlip-jm peptide conjugate to perturb egfr homodimerization and decrease downstream signaling through soluble kinases, such as akt and erk, in cancer cells. the thumb subdomain of yeast mitochondrial rna polymerase is involved in processivity, transcript fidelity and mitochondrial transcription factor binding gilberto velazquez , luis brieba , rui sousa universidad de guadalajara, langebio cinvestav, university of texas healthsscience center at san antonio dna protein interaction abstract single subunit rna polymerases have evolved two mechanisms to synthesize long transcripts without falling off a dna template: binding of nascent rna and interactions with an rna:dna hybrid. mitochondrial rna polymerases share a common ancestor with t-odd bacteriophage single subunit rna polymerases. herein we characterized the role of the thumb subdomain of the yeast mtrna polymerase gene (rpo ) in complex stability, processivity, and fidelity. we found that deletion and point mutants of the thumb subdomain of yeast mtrna polymerase increase the synthesis of abortive transcripts and the probability that the polymerase will disengage from the template during the formation of the late initial transcription and elongation complexes. mutations in the thumb subdomain increase the amount of slippage products from a homopolymeric template and, unexpectedly, thumb subdomain deletions decrease the binding affinity for mitochondrial transcription factor (mtf ). the latter suggests that the thumb subdomain is part of an extended bindingsurface area involved in binding mtf . design principles of membrane protein structures vladimir yarov-yarovoy , diane nguyen membrane protein structure membrane proteins play key role in cellular signaling and ion transport. statistical analysis of expanding database of high-resolution membrane protein structures in protein data bank (pdb) provides useful information about membrane protein structure and function. we used rosettamembrane software (yarov-yarovoy v et al ( ) proteins) to analyze unique alpha helical membrane protein structures in pdb and derive knowledge based energy function for membrane protein structure prediction, membrane protein-protein docking, and membrane protein design. the rosettamembrane residue environment energy term is based on amino acid propensities in hydrophobic, interface, and water layers of the membrane and depends on the residue burial state -from being completely buried within a protein environment to being completely exposed either to the lipid or water environments. residue buried state is determined from the number of residue neighbors within and Å spheres. the rosettamembrane residue-residue interaction term is based on the propensities of amino acid pairs to be in close proximity to each other within hydrophobic, interface, and water layers. results of our statistical analysis reveal fine details of favorable and unfavorable environments for all amino acids types in all membrane layers and residue burial states. we find that large hydrophobic amino acids are favorable facing the hydrophobic core of the lipid bilayer. small amino acids are favorable facing the protein core within the hydrophobic layer of the membrane. aromatic or positively charged amino acids and favorable facing the lipid head groups. residue-residue interactions are often favored between polar and charged amino acids and also between some of small and large hydrophobic amino acids inside of the protein core within the hydrophobic layer of the membrane. these data will be useful for rational design of novel membrane protein structures and functions. coordinated gripping of substrate by subunits of a aaa proteolytic machine ohad yosefson , andrew nager , tania baker , robert sauer protein quality control' or 'protein degradation' hexameric aaa protein-remodeling machines use conserved loops that line the axial pore to apply force to substrates during the mechanical processes of protein unfolding and translocation. an open question in the aaa field is whether pore loops from different subunits of the hexameric ring grip the substrate coordinately (all six subunits involved), independently (one subunit at a time involved), or partially coordinated (two or three subunits at a time). to answer this question, we studied covalently linked hexamers of the e. coli clpx unfoldase bearing different numbers and configurations of wild-type and mutant pore loops and challenged these variants with protein substrates with a broad range of stabilities. we find that successful unfolding of increasingly resistant substrates requires the coordinated action of a greater number of wild-type pore loops. our results support a mechanism in which a power stroke initiated in one subunit of the clpx hexamer results in the simultaneous movement of all six pore loops, which coordinately grip and apply force to the substrate. structure and function of the toc m-domain, and its role in targeting the preprotein receptor to the chloroplast outer envelope membrane matthew smith , shiu-cheung lung , prem nichani , nicholas grimberg , j. kyle weston , shane szalai , simon chuong deartment of biology, wilfrid laurier university, department of biology, university of waterloo chloroplast biogenesis and function rely on the import of thousands of nucleus-encoded preproteins from the cytosol. preprotein import is supported by the toc and tic (translocon at the outer and inner envelope membranes of chloroplasts) complexes, which work cooperatively to translocate preproteins across the double-membrane envelope to the chloroplast interior. toc is one of the preprotein receptors of the toc complex, is also encoded in the nucleus and post-translationally targeted to the chloroplast, and is comprised of distinct domains: ) the intrinsically disordered n-terminal acidic (a-) domain; ) the central gtpase (g-) domain; and ) the c-terminal membrane (m-) domain that anchors the protein to the chloroplast outer membrane (com) through an unknown mechanism. the m-domain has no known homologues and does not contain a predicted trans-membrane domain, but does contain intrinsic chloroplast targeting information at the extreme c-terminus. the m-domain also contains a predicted b-helix motif, which may be important for anchoring the protein to the com. we are interested in characterizing the structure of the m-domain and determining the nature of its association with the com, as part of our larger goal of understanding the role toc plays in protein import into chloroplasts. we are also interested in defining the precise nature of the targeting information contained within the extreme c-terminus of toc , elucidating the targeting pathway that is used, and whether other com proteins use this pathway. we will present our most recent data on the structure, function and targeting of the toc m-domain. structural investigation of nlpc/p protein acquired by trichomonas vaginalis through a lateral gene transfer event jully pinheiro , , augusto simoes-barbosa , david goldstone microbiology, school of biological sciences, university of auckland, structural biology, school of biological sciences, university of auckland trichomonas vaginalis is an extracellular flagellated protozoan parasite that causes the most common non-viral sexually transmitted disease, with approximately million cases worldwide annually. nevertheless, the biochemical processes behind t. vaginalis infection and its interaction with the vaginal microbiota are still not well defined. in the draft genome sequence of trichomonas vaginalis strain g was described, identifying , protein-coding genes. of these, nine genes encode nlpc/p -like members. this superfamily is widely represented in the different kingdoms of life and has diverse enzymatic functions, such as amidases, endopeptidases and acetyltransferases. previous studies have shown that members of this superfamily hydrolyze specific peptide linkages in bacterial cell walls affecting germination, vegetative growth, sporulation and division or cell lysis/invasion. as a typical eukaryote, the protozoan parasite t. vaginalis does not have a cell wall itself. previous studies suggest that the t. vaginalis nlpc/p genes were acquired via lateral gene transfer from bacteria and must have an important function, possibly controlling the vaginal microbiota and aiding parasite invasion and infection. to investigate the function of the nlpc/p family of proteins in t. vaginalis we have expressed, purified and crystallized a member tvag_ and report its three-dimensional structure, determined at . Å resolution, by x-ray diffraction. the structure of the protein reveals a typical papain-like fold resembling peptidoglycan hydrolases from the nlpc/p family with a conserved cysteine and histidine; forming the catalytic residues. the protein contains two bacterial sh domains at the n-terminus. this domain acts as a general binding domain and is likely to aid the interaction of the nlpc/p domain with substrate components. combined with biochemical and enzymatic characterization, the structure of this nlpc/p protein will help to elucidate the molecular origin of its hydrolase activity and to decipher their putative role in the parasite infection. novel dna polymerases from red sea brine-pools: new potential polymerases for pcr application masateru takahashi , etsuko kimura , mohamed salem , ulrich stingl , samir hamdan protein biotechnology the polymerase chain reaction (pcr) is a key tool in medical and biological research. the most common pcr reaction relies on the thermal cycling method that consists of repeated cycles of heating and cooling steps for dna melting and extension by the dna polymerase, respectively. the introduction of new dna polymerases to the market is a major area of development that tremendously helped in improving the performance and quality of pcr. nonetheless, pcr still requires optimization of salt and metal ion concentrations leaving a room in the market for introducing new dna polymerases that are robuster in their salt and metal ion concentration dependence. in this study, we will present the characterization of a novel archaeal dna polymerase from the red sea brine-pool (termed br ) and demonstrate how its enzymatic activity reflects on every aspects of the environment of the brine-pool -high tolerance to concentrations and types of salts and metal ions including utilization of zn ions in its active site. these results suggest that the brine-pool microorganisms are likely to contain novel chemical pathways to deal with its exterior harsh conditions. we will further show the mechanism of br polymerase how it was adjusted to be active in harsh condition. structural basis for the identification of the n-terminal domain of coronavirus nucleocapsid protein as an antiviral target ming-hon hou , shing-yen lin , chia-ling liu , yu-ming chang , jincun zhao , stanley perlman institute of genomics and bioinformatics, national chung hsing university., institute of biological chemistry, academia sinica., department of microbiology, the university of iowa drug discovery coronaviruses (covs) cause numerous diseases, including middle east respiratory syndrome and severe acute respiratory syndrome, generating significant health-related and economic consequences. covs encode the nucleocapsid (n) protein, a major structural protein that plays multiple roles in the virus replication cycle and forms a ribonucleoprotein complex with the viral rna through the n protein's nterminal domain (n-ntd). using human cov-oc (hcov-oc ) as a model for cov, we present the d structure of hcov-oc n-ntd complexed with ribonucleoside '-monophosphates to identify a distinct ribonucleotide-binding pocket. by targeting this pocket, we identified and developed a new coronavirus n protein inhibitor, n-( -oxo- , -dihydrophenanthridin- -yl)(n,n-dimethylamino)acetamide hydrochloride (pj ), using virtual screening; this inhibitor reduced the n protein's rna-binding affinity and hindered viral replication. we also determined the crystal structure of the n-ntd-pj complex. on the basis of these findings, we propose guidelines for developing new n protein-based antiviral agents that target covs. thermal and structural stability of ß-glucosidases gh maira artischeff frutuoso departamento de bioqu ımica do instituto de qu ımica da universidade de são paulo enzymology we compared the stability of thermophilic b-glucosidases gh to mesophilic ones in the presence of denaturants as urea and high temperature by following the transitions between the native and unfolded states by tryptophan fluorescence, enzymatic activity and differential scanning fluorimetry (dsf). the bacterial b-glucosidases (bgla) and (bglb) of the mesophile paenibacillus polimyxa and bglucosidase (bglthm) of the thermophile thermotoga maritima were expressed as recombinant proteins in novablue (de ) and purified by affinity chromatography (ni-nta resin). these recombinant enzymes have very similar folding type structure (b/a) barrel, as shown in crystal structures and exhibited a characteristic peak between and nm in the tryptophan fluorescence spectra, indicating that those proteins are folded. circular dichroism analysis in the far-uv region ( nm to nm) also showed typical spectra of folded proteins with secondary structure composition of % of a-helix and % of b-sheets for bgla, % of a-helix and . % of b-sheets for bglb and % of a-helix and % of b-sheets for bglthm. the average degree of accessibility to the exposed tryptophan residues in the native enzyme to increasing concentrations of the acrylamide suppressor (stern-volmer constant -ksv) is greater to bgla ( . ), but similar to bglb ( . ) and bglthm ( . ). the thermal stability determined by dsf was higher for bglb (tm . c) than for bgla (tm . c) . the bglthm was stable at c and remained stable for up to h at c. in addition the thermal inactivation kinetics at c evaluated by the relative remaining activity showed that bgla denaturation (kinactivation of . s- ) is faster than bglb (kinactivation of . s- ). on the other site, bglthm inactivation at c was a two-step process, which exhibited an initial fast step (kinactivation of . s ) followed by a slow step (kinactivation of . s- ). the chemical denaturation by urea followed using tryptophan fluorescence showed a transition pl- covalent structure of single-stranded fibrinogen and fibrin oligomers cross-linked by fxiiia. the influence of free radical oxidation anna bychkova , vera leonova , alexander shchegolikhin , marina biryukova , elizaveta kostanova , mark rosenfeld n. m. emanuel institute of biochemical physics, russian academy of sciences protein structure and function native fibrinogen is a key blood plasma protein whose main function is to maintain hemostasis by virtue of producing the cross-linked fibrin clots under the effect of thrombin and fibrin-stabilizing factor (fxiiia). fxiiia-mediated isopeptide g-g bonds are known to be produced between g polypeptide chains of adjacent fibrinogen or fibrin molecules. but there are apparently conflicting ideas regarding the orientation of g-g bonds. in this study several peculiarities of self-assembly of fibrin(ogen) and induced oxidation of the proteins have been studied with the aid of elastic and dynamic light scattering, uv-, ftir-and raman spectroscopy methods. in the presence of fxiiia both the non-oxidized and oxidized fibrinogen molecules has been shown to bind to each other in the "endto-end" fashion to form the flexible covalently cross-linked fibrinogen homopolymers. to identify the orientation of g-g bonds in fibrin protofibrils a novel approach based on self-assembly of soluble cross-linked fibrin protofibrils and their dissociation in the urea solution of moderate concentrations has been applied. the results of elastic and dynamic light scattering coupled with analytical ultracentrifugation indicated the protofibrils to exhibit an ability to dissociate under increasing urea concentration to yield single-stranded structures entirely brought about by g-g bonds. the results of this study provide an evidence to support the model of the longitudinal g-g bonds that form between the g chains end-to-end within the same strand of a protofibril. since fibrinogen is known to be sensitive to ros the mechanisms of fibrinogen and fibrin self-assembly under induced oxidation have been investigated. in both cases the polypeptide chains of the oxidized fibrin(ogen) proved to be involved in the enzymatic cross-linking more readily than those of unaffected molecules. the enhancing role of the d:d interaction under oxidation could be considered as an compensatory mechanism in the assembly of fibrin when the d:e interaction is impaired. the experimental data on fibrinogen and fibrin oxidation acquired in the present study, being combined with our earlier findings, make it reasonable to suppose that the spatial structure of fibrinogen could be evolutionarily adapted to some ros actions detrimental to the protein function. the study was supported by the rfbr, research projects - - mol_a and - - a. structural and thermodynamic analysis of co-stimulation receptor cd phosphopeptide interactions with grb , gads, and pi -kinese sh domains in addition to the signaling produced by the binding of antigen-major histocompatibility complex to tcell receptors, co-stimulatory signals from other receptor-ligand interactions are required for full activation of t-cells. the cd receptor on the t-cell surface has been well characterized, and the binding of ligand to cd is critical for producing co-stimulatory signals. cd has no enzymatic activity and its cytoplasmic region consists of amino acids that contain the sequence ymnm, in which the tyrosine residue is phosphorylated by kinase. the phosphorylated sequence, pymnm, is recognized by src homology (sh ) adaptor proteins, such as growth factor receptor binding protein (grb ), grb related adaptor downstream (gads), and the phosphatidylinositol -kinase (pi -kinase) regulatory subunit, p . the consensus sequence for the binding of grb sh and gads sh is pyxnx, and that of p n-terminus sh (nsh ) and c-terminus sh (csh ) is pyxxm. we reported the high-resolution crystal structure of grb sh in complex with the cd phosphopeptide [higo et al., plos one , e , ] , and recently determined those of gads sh , p nsh , and p csh . these data along with the results of binding thermodynamics analyzed using isothermal titration calorimetry, helped to elucidate the molecular recognition mechanisms of cd by adaptor proteins. the sh proteins were overexpressed in escherichia coli, and were purified using affinity and gel-filtration chromatography. the cd phosphopeptides, -residue (octp) and -residue (ddcp ), were synthesized using the solidphase supported technique, and were purified using reversed-phase chromatography. the crystals were obtained by the hanging-drop vapor diffusion method. x-ray diffraction data were collected at synchrotron radiation facilities, and the structures were determined by the molecular replacement method. the models of grb sh , gads sh , p nsh , and p csh in complex with octp were refined at . , . , . , and . Å resolutions, respectively. the crystal structures showed that the phosphotyrosine phosphate moiety directly interacted with the side-chain of arginine in sh , which is common in all complex structures. in the grb sh and gads sh complexes, the side-chain of asparagine at the py position forms a pair of hydrogen bonds with the main-chain amide and carbonyl groups of lysine in sh . alternatively, in the p nsh and csh complexes, the side-chain of methionine at the py position is located in hydrophobic pockets of nsh and csh , in which the hydrophobic interactions of csh would be stronger than those of nsh . this idea is supported by the observed binding thermodynamics. the binding affinity of csh to ddcp , because of a favorable enthalpy change, is about -fold higher than that of nsh . the binding affinity of grb sh to ddcp is similar to that of gads sh to ddcp , and is about -fold lower than that of nsh to ddcp . these results indicate that the contribution of hydrophobic interactions of nsh and csh at the py position are stronger than those of hydrogen bonds of grb sh and gads sh at the py position. novel kinetochore protein complex from silkworm holocentric chromosomes takahiro kusakabe , hiroaki mon , jaeman lee the kinetochore, which consists of centromere dna and a multilayered protein complex, plays important roles in chromosome organization and segregation. interactions between chromosomes and spindle microtubules allow chromosomes to congress to the middle of the cell, and to segregate the sister chromatids into daughter cells in mitosis, which is followed cytokinesis. in contrast to monocentric chromosomes, in which the centromere is normally present at a single region on each chromosome, the holocentric chromosomes have centromeric activity along the entire length of the chromosome. it has been known that the silkworm, bombyx mori, has holocentric chromosomes since s, none of silkworm kinetochore proteins, however, have been identified so far. here we report the identification of a novel set of genes for outer kinetochore proteins in silkworm by using bioinformatics and rna interference-based screening. under the hypothesis that depletion of essential kinetochore genes causes cell cycle arrest in mitosis, we performed rnai in the silkworm cell line, bmn -sid , targeting a set of candidate genes. knockdown of five genes caused significant cell cycle arrest at the g /m phase. we also found that these five proteins make a complex, and that all of them are localized along the chromosome arms, indicating that the silkworm kinetochore extends along the chromosome. inactivation of bine aldehyde dehydrogenase from spinach by its physiological substrate bine aldehyde to contend with osmotic stress caused by drought, salinity, or low temperatures some plants synthesize the osmoprotectant glycine bine (gb) from bine aldehyde (bal). the last step-the irreversible nad dependent oxidation of bal-is catalyzed by aldh enzymes that exhibit bine aldehyde dehydrogenase (badh) activity. we here report that the spinacia oleracea badh (sobadh) is reversibly inactivated by bal in the absence of nad in a time-and concentration-dependent mode to approximately % of the original activity. inactivation kinetics are consistent with a partial reversible, two-steps mechanism that involves the formation of an active non-covalent enzyme•bal complex before formation the inactive enzyme-bal complex. crystallographic evidence indicates that in the enzyme previously inactivated by bal the aldehyde forms a thiohemiacetal with the nonessential cys (sobadh numbering) located at the aldehyde-entrance tunnel, thus totally blocking the access to the catalytic cysteine. accordingly, bal does not inactivate the c s sobadh mutant. two crystal structures of the inactivating enzyme-bal complex showed that the trimethylammonium group of bal is inside the active-site aromatic box, as in the productive way of binding. this explains why the inactivation of the a i mutant-where the binding of the trimethylammonium group is hindered-requires non-physiologically high bal concentrations, while the a c mutant-where the binding is allowed-is inactivated similarly to the wildtype enzyme. cys- is conserved in most plant aldh enzymes of known sequence, and in all of them with proven or predicted badh activity. inactivation by bal appears therefore to be a common feature of plants badhs. this short-term regulation may be of great physiological importance since the irreversibility of the badh-catalyzed reaction would unbalance the nad /nadh ratio if the aldehyde concentrations are high, the nad concentrations low and the reaction is not slowed down. plants badhs are prone to this situation since they work under osmotic stress conditions, when high bal concentrations are required for the synthesis of high levels of the osmoprotectant gb. the partial nature of the decamer possesses a donut shaped structure with calcium ions on the surface available for interactions with carbohydrate molecules. binding specificity was evaluated for carbohydrates using differential scanning fluorimetry (dsf) that showed bjcul interacts with galactose and lactose but less with glucose and sacarose. surprisingly, high levels of thermostabilization of bjcul was achieved with the antibiotic aminoglycosides geneticin (g ) and gentamicin in a calcium concentration dependent manner, but not kanamycin. intriguingly, while lactose and galactose inhibited erythrocyte agglutination by bjcul, g and gentamicin did not affect hemagglutination implying a second site of binding. dsf analysis also suggested the presence of a second binding site for the antibiotics and crystallization of the complexes are in progress in order to understand fully this new binding mechanism of c-type lectin with antibiotics. ab initio modelling of structurally uncharacterised antimicrobial peptides mara kozic institute of integrative biology, university of liverpool ab initio modelling of structurally uncharacterised antimicrobial peptides mara kozic * institute of integrative biology, biosciences building, university of liverpool, crown street, liverpool l zb, united kingdom * mara.kozic@liverpool.ac.uk antimicrobial resistance within a wide range of infectious agents is a severe and growing public health threat. antimicrobial peptides (amps) are among the leading alternatives to current antibiotics, exhibiting broad spectrum activity. an understanding of the structure of a protein can lead us to a much improved picture of its molecular function. furthermore, an improved understanding of structure-function relationships facilitates protein design efforts to enhance their activity. currently, the d structures of many known amps are unknown. to improve our understanding of the amp structural universe we have carried out large scale ab initio d modelling of structurally uncharacterised amps. such ab initio modelling is facilitated by the typical small size of amps as well as their tendency to contain disulphide bonds, these providing valuable additional information to simulations. preliminary results reveal unexpected similarities between the predicted folds of the modelled sequences and structures of well-characterised amps. for example, lacticin q was revealed to contain a helical bundle fold that bears a striking resemblance to enterocin a. we also found a remarkable similarity between the predicted structure of silkworm peptide and b-hairpin amps such as tachyplesin i. our results improve the understanding of the structure-function relationship of amps. surface aggregation-propensity as a constraint on globular proteins evolution susanna navarro , marta diaz , pablo gallego , david reverter , salvador ventura institut de biotecnologia i biomedicina and departament de bioquimica i biologia, institut de biotecnologia i biomedicina, universitat aut onoma de barcelona in living cells, functional protein-protein interactions compete with a much larger number of nonfunctional interactions. theoretical studies suggest that the three-dimensional structures of present proteins have evolved under selective pressure to avoid the presence of aggregation-prone patches at the surface that may drive the establishment of anomalous protein contacts. however, no experimental evidence for this hypothesis exists so far. the a-spectrin sh domain (spc-sh ) has been used as a protein model to decipher the sequential aggregation determinants of proteins. here we use it to address the structural determinants of protein aggregation and their link to protein evolution. to this aim we exploit aggrescan d (a d), a novel algorithm developed by our group, which takes into account both protein structure and experimental data to project aggregation propensities on protein surfaces. we used a d to design a series of spc-sh variants with progressively stronger aggregationprone surfaces and characterized their thermodynamic, structural and functional properties. our data support evolution acting to constraint the aggregation propensities of globular protein surfaces in order to decrease their potential cytotoxicity and the protein quality control machinery acting to buffer this negative selective pressure. utilizing d structure for the annotation of structural motifs in the conserved domain database narmada thanki-cunningham , noreen gonzales , gabriele marchler , myra derbyshire , james song , roxanne yamashita , christina zheng , stephen bryant , aron marchler-bauer , farideh chitsaz conserved domain database, structure group cbb/ncbi/nlm/nih the conserved domain database (cdd) is a protein classification and annotation resource comprised of multiple sequence alignments representing ancient conserved domains. cdd protein domain models are curated by ncbi and use d protein structure explicitly to define domain extent and the location of conserved core structures, and to provide accurate alignments between diverse family members via structure superposition. cdd also imports external collections such as pfam and tigrfam. recently, a novel class of annotation labeled as "structural motifs" has been introduced to supplement current capabilities. these annotations define compositionally-biased and/or short repetitive regions in proteins, which are difficult to model as functional domains conserved in molecular evolution. structural motifs include transmembrane regions, coiled coils, and short repeats with variable copy numbers. for many types of short tandem repeats, a few position-specific score matrices (pssms) suffice to annotate more than % of the known instances of that structural motif. unfortunately, a lack of sequence similarity within coiled-coil regions prohibits the development of only a few generic models; therefore, models for coiled-coil regions in the context of specific families have been developed using the spiricoil database as a reference. increased coverage of coiled-coil regions in cdd, specific site annotations of these structural motifs as well as their representation on the webpages will be discussed. specific in vivo ultrasound imaging of e-selectin expression in tumors using a microbubble contrast agent covalently attached to the peptide ligand iellqar, known to bind to e-selectin [ ] . however, it was observed that this probe has a limitation in the imaging of cardiovascular diseases where higher shear stresses prevent microbubbles from remaining attached to the target. therefore, peptides with higher eselectin affinity are needed to design probes capable of imaging these diseases. in this context, automated docking and molecular dynamics methodologies were combined and applied to different e-selectin binding peptides. these studies predicted the energetically more favorable binding mode as well as the key interactions between the peptide ligands and the e-selectin receptor. some of these peptides were prepared by solid-phase peptide synthesis and their interactions with e-selectin analyzed by surface plasmon resonance technique. the results showed that these peptides have different affinities for e-selectin. these data were correlated with the computational studies and evaluated to obtain crucial information of the key recognition elements needed for higher e-selectin affinity. these recent results will be presented. burkholderia pseudomallei is the causative agent of melioidosis, a serious invasive disease of animals and humans in tropical and subtropical areas. sedoheptulose- -phosphate isomerase from b. pseudomallei (bpgmha) is the antibiotics adjuvant target for melioidosis. in general, bpgmha converts dsedoheptulose- -phosphate to d-glycero-a-d-manno-heptopyranose- -phosphate (m p). this is the first step of the biosynthesis pathway of ndp-heptose responsible for a pleiotropic phenotype. therefore, this biosynthesis pathway is the target for searching novel antibiotics increasing the membrane permeability of gram-negative pathogens or adjuvants synergistically working with known antibiotics. the crystal of this enzyme has been solved at . Å resolution. there is an active site pocket where a putative metal binding site is located. to find out inhibitors of bpgmha, in-silico virtual screening with zinc, a free database of commercially-available compounds, has been performed. tens of thousands of chemical compounds were docked into the active site of bpgmha. a number of putative bpgmha binding compounds better than m p were found using surflex-dock included in the sybyl software package. characteristics of these compounds were surveyed and classified to identify common binding properties with bpgmha. mapping the structure of laminin using cross-linking and mass spectrometry gad armony , toot moran , yishai levin , deborah fass weizmann institute of science, department of structural biology, weizmann institute of science, israel center for personalized medicine laminin, a kda heterotrimer, is a major element in the extracellular matrix (ecm). within the ecm, laminin contributes to the adhesion and migration of cells, both in health and disease. the laminin trimer was observed by rotary shadowing electron microscopy to be cross shaped: the three short arms of the cross are formed by the amino-terminal halves of the three subunits, whereas the long arm of the cross holds the three chains together in a long coiled coil. the narrow and flexible arms of the laminin cross complicate studying its structure to high resolution by crystallography or electron microscopy single particle reconstruction. to advance our understanding of this remarkable quaternary structural assembly, we have used cross-linking and mass spectrometry to analyze the organization of the laminin trimer. this technique was validated by known crystal structures of isolated laminin domains. in all cases the crystal structure distances agree with the cross-linker length. the identified cross-links were particularly helpful in assigning the register and the subunit order of the long coiled coil due to the high content of cross-linkable residues in this region. using known x-ray crystal structures, homology modeling, and distance restraints provided by two cross-linker chemistries, a clearer picture of the laminin quaternary structure is obtained. non-sequential protein structure alignment program mican and its applications shintaro minami , george chikenji , motonori ota dept. of info. sci., nagoya univ., dept. of comp. schi. & eng., nagoya univ. in some proteins, secondary structure elements are arranged spatially in the same manner, but they are connected in the alternative ways. analysis on such non-sequential structural similarity in proteins is important because it provides a deeper understanding of the structural geometry of protein. this can be also observed even in the homologous proteins, indicating the non-sequential structural similarity is significant in the protein evolution. however, the non-sequential structural similarity in proteins is less investigated. we developed a novel non-sequential structural alignment program mican, which can handle multiple chains, inverse direction of chains, c$lpha$models, alternative alignments, and non-sequential alignments. we performed comprehensive non-sequential structural comparison among homologous proteins in the same scop superfamily by using the mican program. based on the result, we found that approximately % of superfamilies include at least one protein pairs showing non-sequential structural similarity. % nonsequential structurally similar pairs are aligned in a simple way, e.g. circular permutation, $$strand flip/ swap, but % are complicated. interestingly, most of such complicated non-sequential similarities can be explicable by combination of - simple non-sequential relationships. this result indicates that accumulation of simple structural changes in the course of protein evolution produces completely different fold homologs. as early as , ritter surmised that the cell's molecules cooperate to form a "special apparatus and an organised laboratory". despite supporting evidence from srere, mcconkey and others, efforts to understand molecular organisation in vivo are still in their infancy. however, important aspects of the cell interior have already been revealed. for example, weak molecular interactions structure the cytoplasm into time-evolving, functional zones. weak interactions are difficult to capture and can preclude protein detection in cells by many biophysical techniques, including nmr spectroscopy. , we explored the effects of cell-like milieus on the cytochrome c (cyt c)-flavodoxin (fld) interaction. these oppositely charged proteins interact weakly with a number of cognate partners. neither cyt c nor fld is detectable by nmr in escherichia coli confirming their "sticky" nature ( figure a) . the cyt c-fld interaction was assessed in buffer, % polyacrylamide gels and in solutions containing g/l of macromolecular crowders ( figure b) . h, n hsqc nmr revealed that the interaction was transient in buffer, proceeding via the known binding site for both proteins. substantial line broadening was effected in crowded and confined solutions suggesting that the cyt c-fld complex is stabilised under native-like conditions. the stabilising effect of macromolecular crowders was also observed by native gel electrophoresis and crystallization. these findings coincide with spitzer and poolman's model for cytoplasmic structuring, emphasising the role of charge-charge interactions and crowding in the formation of macromolecular "clusters". the implications for cytoplasmic structuring will be discussed alongside related investigations of cationic protein interactions in e. coli extracts. , detergent:protein ratio. the transmembrane b-barrel of bama is folded in either micelles, bicelles or nanodiscs, however an n-terminally attached single potra domain is flexibly unfolded, due to the absence of stabilizing contacts with other protein domains. measurements of backbone dynamics show distinct time scales of dynamic behavior for bama b-barrel and parts of its extracellular loop l , revealing high local flexibility within the the lid loop. this work presents the first high-resolution d solution nmr spectra of the bama barrel and establishes improved biochemical preparation schemes, which will serve as a platform for structural and functional studies of bama and its role within the bam complex. protein arginine methylation is a widespread and important posttranslational modification in eukaryotic cells, shown to be involved in the activation or repression of transcription, modification of the splicing machinery, signaling, and dna repair. mammalian protein arginine methyltransferases include a family of nine sequence-related enzymes that transfer one or two methyl groups onto the terminal guanidino groups on arginine residues, producing monomethylarginine only (mma, type iii), symmetric dimethylarginine (sdma) and mma (type ii), or asymmetric dimethylarginine (adma) and mma (type i). while prmt , , , , , and have been characterized as type i enzymes, and prmt as a type ii enzyme, the role and activity types of the two final members of this family of enzymes, prmt and prmt , had been unclear due to conflicting results in the literature, and the substrates for these enzymes had been elusive. both prmt and prmt are distinct members of the family with two methyltransferase or methyltransferase-like domains and containing acidic residues in otherwise well-conserved substrate double e binding motif, features not seen in the other prmt enzymes. recent work in our laboratory confirmed prmt as the only type iii mma-forming enzyme in the group, with a unusual low temperature optimum for activity, and a heretofore not seen preference for a basic stretch of residues in an r-x-r sequence for methylation. mutations of the acidic residues in the substrate-binding motif results in a loss of the specific r-x-r activity and the appearance of a g-r-g specificity typical of many of the other prmts. the physiological substrate of prmt has yet to be confirmed, although histone h b is an effective in vitro substrate. prmt , on the other hand, had no reported activity, until immunoprecipitation from hela cells showed it pulled down two splicing factors, sf b and sf b , in a complex. amino acid analysis showed that prmt methylates sf b to produce both mma and sdma, thus making it the second type ii enzyme in mammals. prmt knockdown results in modulation of alternative splicing events. this enzyme appears to be relatively specific for the sf b protein; a peptide containing the methylatable arginine residue was not found to be a substrate, and typical substrates of other prmts are not recognized by prmt . we found that the position of the methylated arginine residue in sf b is important, and the acidic residues in the substrate-binding motif also play an important role in substrate recognition. thus, prmt and prmt represent unique members of the mammalian prmt family. hydrogen peroxide levels, endogenous hormones (cytokinins, salycilic acid, as well as jasmonic acid and its conjugates), polyphenolics and terpenoids in a model system of a. alba in vitro with inhibition of rootng and stimulation of callusogenesis by means of individual and combined cytokinin and cytokinin/ auxin treatments. results: it was established that inhibition of rooting and stimulation of callusogenesis caused by benzyl adenine (ba) or combinations of ba and indole- -butiric acid (iba) in vitro were related to elevation of sesquiterpenoids in the essential oils, as well as polyphenolics content, accompanied by a drop of stress hormones, bioactive cytokinins and preservation of oxidative stress and lipid peroxidation levels, as compared with non-treated control. individual treatments with either iba or ba, also increased the sesquiterpenoid content in the essential oil of the plant, in a concentration related manner, this effect being more profound after ba treatment. in addition, ba treated plants exhibited a drop of protein levels of the aerial samples, as well as profound differences of enzymatic activity in the callus tissues, as compared with callus of plants treated with different combinations of ba and iba. conclusion: the results of the present work indicate that alterations of endogenous phytohormonal levels, caused by exogenous plant growth regulators treatment, might be the mediator between primary and secondary metabolism by means of affecting protein levels and activity of key enzymes in vitro. three different additives (( . % (v/v); formic acid, acetic acid, ammonium format with formic acid) have been investigated in response to ion intensity of esi-ms for individual hnp - in saliva. kinetex v r column separation efficiency was evaluated using two different column dimensions ( x . mm and x mm.) and two different stationary phases (c and c ). kinetex v r column (homogenous porous shell) performance was also compared to new ultra ace v r (encapsulated bonded phase) column. sample optimisation revealed that the spe method removes interference from salivary glycoproteins and consequently yields larger peak area ( - %) for all hnps. hnps were extracted by spe with a recovery of - %. the meoh: h o: acetic acid ( . %) provided enhanced (p> . ) hnp - ion intensities. the kinetex v r c ( x . mm, . mm) column facilitated a better separation efficiency of the four hnps as compared to the ultra core super c ace v r ( x . mm, mm) column, the kinetex v r c ( x . mm, . mm) and the kinetex v r c ( x . mm, mm) column. the relative levels of the hnps were determined in healthy volunteers before and after a rigorous exercise regime: it is possible that prolonged strenuous exercise will affect oral innate immunity and therefore also the level of salivary defensins. hnp - are traditionally detected in an enzyme-linked immunosorbent assay (elisa) which does not discriminate between the different hnps due to their structural similarities. there has therefore been a need to develop a mass spectrometry method that will discriminate between the defensins. as part of the method validation, the hnp - level was determined by elisa and the data was compared with the lc-ms data. here we present this cross-validation; the data revealed no significance difference between the two methods (r . ) which confirms that the developed lc-ms method is and equal sensitive method for the detection of these potential antimicrobial markers. this method can easily be adopted for similar molecular weight of peptides as hnps and also for any other biological matrix. moonlighting proteins: relevance for biotechnology and biomedicine luis franco serrano , sergio hern andez , alejandra calvo , gabriela ferragut , isaac amela , juan cedano , enrique querol institut de biotecnologia i biomedicina. universitat aut onoma de barcelona, laboratorio de inmunolog ıa, universidad de la rep ublica regional norte-salto multitasking or moonlighting is the capability of some proteins to execute two or more biochemical functions. the identification of moonlighting proteins could be useful for researchers in the functional annotation of new genomes. moreover, the interpretation of knockout experiments, in which the result of a gene knocking does not produce the expected results, might be enhanced. the action of a drug can also be facilitated because it might have an off-target or side effect with somewhat hidden phenotypic traits. it would be helpful that bioinformatics could predict this multifunctionality. in the present work, we analyse and describe several approaches that use protein sequences, structures, interactomics and current bioinformatics algorithms and programs to try to overcome this problem. among these approaches there are: a) remote homology searches using psi-blast, b) detection of functional motifs and domains, c) analysis of data obtained of protein-protein interaction databases (ppis), d) matches of the sequence of the query protein to d databases (i.e., algorithms like pisite), e) mutation correlation analysis between amino acids using algorithms like mistic. remote homology searches using psi-blast combined with data obtained from interactomics databases (ppis) have the best performance. structural information and mutation correlation analysis can help us to map the functional sites. mutation correlation analysis can only be used in very specific situations because it requires the existence of a multialigned family of protein sequences, but it can suggest how the evolutionary process of second function acquisition took place. we have designed a database of moonlighting proteins, multitaskprotdb (http:// wallace.uab.es/multitask/). from this database we determine the frequencies of canonical and moonlighting coupled functions (being an enzyme and a transcription factor the highest), the percentage of moonlighting proteins involved in human diseases ( % of the human moonlighting proteins in the database) and the percentage of moonlighting proteins acting as a pathogen virulence factor ( % of the moonlighting proteins in the database). correlation between potential human neutrophil antimicrobial peptides (hnp - ) and stress hormones in human saliva nadia ashrafi , frank pullen , birthe nielse , cris lapthorn , fernando naclario university of greenwich (faculty of engineering and sciene), university of greenwich (centre of sports science and human performance) numerous studies have investigated the effect of exercise on mucosal immunity but the focus has mainly been on salivary immunoglobulins lysozymes and hormones (cortisol, testosterone). this is not surprising given that iga and igg are the predominant immunoglobulins in saliva and there is a relationship between mucosal immunity and upper respiratory illness. it is well known that physical and mental stress provoke the release of cortisol from hypothalamic pituitary adrenal axis, by which stress can modulate various immune responses. in general, cortisol and growth hormones helps to induce the activation of neutrophils. to date, this study represents the first study that investigated the correlation between human neutrophil alpha defensins family against cortisol (stress hormone) and testosterone (growth hormone) in human saliva before and after exercise or training. twelve resistance trained athletes volunteered to participate in the study. participants consumed supplements during exercise and the hnp - , cortisol and testosterone response was investigated pre, post and minutes of the workout. the correlation between salivary antimicrobial peptide (hnp - ) and stress hormone (cortisol and testosterone) has been investigated using elisa. cortisol showed no significant (p . ) difference for (pre to min post) between cho and pl (cho: . . ng/ml; pl . . ng/ml) conditions but a strong trend (p . ) was observed for (pre to min post) post (cho: . . ng/ml; pl . . ng/ml) condition. testosterone showed no significant (p . ; p . ) difference for (pre to min post) between cho and pl (cho: . . ng/ml; pl . . ng/ml) and for (pre to min post) post (cho: . . ng/ml; pl . . ng/ml) condition. hnp - showed no significant (p . ) difference for (pre to min post) between cho and pl (cho: . . ; pl . . ) conditions but significant difference (p . ) was observed for (pre to min post) between cho and pl (cho: . . ; pl . . ) condition. the present findings suggested that there is no correlation between salivary hnp - and cortisol for (pl: r . and cho: r . ); hnp - and testosterone (pl: r . and cho: r . ). a worth note from previous study which suggested that using murine skin model (an increase in endogenous glucorticoids (cortisol) by physiological stress reduced mrna levels of antimicrobial peptide (cathelicidin). it is not clear that the correlation between hormones and antimicrobial peptide has been affected by the time interval of the exercise. both cortisol and antimicrobial peptide demonstrated a transient increase after exercise but it is surprising that they are not correlate to each other. one of the hypothesis from the present finding could be cortisol responses slow and it will be interesting to do further research with longer interval. the second hypothesis demands a further investigation to determine the synergism between substances. school of biomolecular and biomedical science, conway institute, ucd., king saud university, sciences, biochemistry department. the crystal structure of a human glucose -phosphate dehydrogenase (g pd) shows that each subunit has two nadp sites; in addition to a catalytic site there is a "structural" site which is distant from the catalytic coenzyme site. mutations causing severe deficiency tend to cluster round and close to the dimer interface and the structural nadp , indicating that the integrity of these areas is important for enzyme stability and therefore for maintenance of activity. in order to understand the molecular basis of g pd deficiency, and to have a clearer indication about the role of some features of the threedimensional structure, a fuller study of the second, "structural" nadp binding site is needed. human g pd controls the first committed step in the pentose phosphate pathway. it catalyses the oxidation of glucose -phosphate to gluconolactone -phosphate, generating nadph which is essential, amongst other things, for protection against oxidative stress. the human enzyme can be active in dimer or tetramer forms. human g pd of "structural" nadp per subunit of enzyme. this tightly-bound nadp can be reduced by g p, probably following migration to the catalytic site. the importance of nadp for stability is explained by the structural nadp site, which is not conserved in prokaryotes. after removing the tightly bound "structural" nadp the enzyme is still active but not stable. the effects of different nadp fragments on the stability of human recombinant g pd have been investigated. nadp is crucial for the long term stability of human g pd, and only one of nadp analogues which is adenosine diphosphate ribose - '-phosphate was able to slightly promote the stability of enzyme. . molecular characterization of specific positively selected sites in mammalian visual pigment evolution miguel a. fern andez-sampedro , eva ramon , brandon m. invergo , jaume bertranpetit , pere garriga grup de biotecnologia molecular i industrial., visual rhodopsin is a member of the g-protein coupled receptors superfamily. this membrane protein consists of a -cis-retinal cromophore bound to a seven transmembrane protein, opsin, by means of a protonated schiff base linkage. it has an important role as a dim light photoreceptor in the retina of the eye. by statistical models, where episodic selection in rhodopsin is tested on one branch of the phylogeny against a background of neutral or purifying selection on the rest of the tree, we have found some significant evidence of specific positively selected sites in early mammalian divergence. we have chosen the three amino acid sites identified with the highest posterior probability of having been targets of positive selection to perform experimental studies, i.e. (positively selected from m to f), (positively selected from r to q) and (positively selected from s to a). we have constructed, expressed, immunopurified and functionally characterized the proposed candidates, f m, q r and a s rhodopsin mutants located at the n-terminus, the transmembrane domain and the c-terminus region of the protein respectively. from the analysis of the molecular features of the f m mutant, we conclude that position is very important for protein folding and also for proper protein glycosylation, since we only could observe cromophore regeneration after its rescue in the double cysteine (n c/ d c) mutant background that stabilizes the n-terminal extracellular domain of the protein. our results also show that mutants q r and a s alter the g-protein activation rate, and hydroxylamine susceptibility in the dark-adapted state. in the case of q r, disrupting critical interactions with the neighbouring y of the conserved d/ery motif, critical in gt activation, could cause the lower gt activation ability. the mutant a s would create a potential additional phosphorylation site in the protein which could affect rhodopsin phosphorylation after photoactivation and, in turn, could affect the binding affinity of arrestin, a regulator of rhodopsin deactivation. this extra phosphorylation site could provide an evolutionary explanation for the enhanced response observed in the case of gt activation. in conclusion, these results highlight the importance of molecular investigations of positive selected sites in rhodopsin evolution and the relevance of structural and functional analysis of these sites in unravelling the molecular basis of visual pigment evolution. natural evolution sheds light on modern drug resistance in protein kinases marc hoemberger , christopher wilson , roman agafonov , dorothee kern the anti-cancer drug imatinib exhibits highly specific binding to the human kinase and oncogene abl with a three thousand fold weaker affinity for the structurally and functionally very similar kinase src. it has been shown recently that the major difference in binding of imatinib to abl and src stems from an induced fit after binding of the drug. to further understand the mechanism of imatinib binding to its target we used ancestral sequence reconstruction (asr) and resurrected enzymes along the node from the common ancestor of abl and src up to the extant kinases. we show that imatinib affinity is gained towards the evolution of extant abl while it is lost towards evolving src. the combination of asr and crystallographic data of the ancestors in addition to kinetics data allowed us to identify a subset of residues involved in imatinib specificity sufficient to switch from an intermediate binder to a tight binder. preliminary data shows that a network of hydrogen bonds and packing interactions stabilize the kinked p-loop conformation for tight binders thus allowing for more interactions between the kinase and the drug. strikingly, many of these residues were identified in human cancer patients as "hot spots" for the development of resistance mutations. further investigation into the identified subset of residues in combination with these commonly found imatinib resistance mutations will allow us to understand emerging drug resistances better. an evolutionary view of the cold adapted catalysis of enzymes vy nguyen , christopher wilson , dorothee kern the diversity in protein function that we see today arose as a result of life adapting to a cooling earth. how did enzymes, the catalysts of many crucial cellular processes, achieve this cold adaptation? this is a challenging question to answer because ancient sequences of proteins that existed billions of years ago are not available. to address this question we used ancestral sequence reconstruction to create adenylate kinase (adk) enzymes from the divergence of anaerobic and aerobic firmicutes towards modern day thermophilic, mesophilic and psychrophilic organisms. adk is a phosphotransferase that catalyzes the conversion of two adp molecules into atp and amp. we make the following observations. first, all ancestral enzymes are active with optimal catalytic rates linearly corresponding to the temperature of the environments where these proteins would have been found. most strikingly, the catalytic rate of our oldest adk ancestor exhibits a higher enthalpy of activation at low temperatures as compared to the modern thermophilic adk. this suggests a large enthalpic penalty had to be paid for reactions to occur at cold temperatures in an ancestor that existed in a hot environment. second, several high resolution crystal structures of extant proteins that we solved ( . Å - . Å), show that the oldest ancestors were more rigid than the modern adks due to an intricate salt-bridge network. this work, thus shows for the first time, the molecular and thermodynamic determinants of cold adaptation in an enzyme over a time period that spans billions of years. induced oxidative modification of plasma and cellular fibrin-stabilizing factor anna bychkova , tatiana danilova , alexander shchegolikhin , vera leonova , marina biryukova , elizaveta kostanova , alexey kononikhin , anna bugrova , evgeny nikolaev , mark rosenfeld n. m. emanuel institute of biochemical physics, russian academy of sciences, institute for energy problems of chemical physics, russian academy of sciences the main function of plasma fibrin-stabilizing factor pfxiii is to catalyze the formation of the intermolecular covalent cross-links between both gand afibrin polypeptide chains. the crosslinking crucially affects mechanical strength of fibrin and its resistance against fibrinolysis. the precise role of cellular fibrin-stabilizing factor cfxiii remains poorly understood. pfxiii is a heterotetramer (fxiii-a b ) consisting of two single-stranded catalytic a subunits (fxiii-a ), and two identical single-stranded inhibitory/ carrier b subunits (fxiii-b ). the subunits are held together by weak non-covalent bonds. contrary to plasma fxiii, cfxiii is a dimer (fxiii-a ) devoid of b subunits. as well as many other proteins circulating in the bloodstream, pfxiii is known to be a target for reactive oxygen species (ros) causing processes of protein oxidative modification. since the conversion of pfxiii to the active form of the enzyme (fxiiia) is a multistage process, ozone-induced oxidation of pfxiii has been investigated at different stages of its enzyme activation. the biochemical results point to an inhibition of enzymatic fxiiia activity depending largely on the stage of the pfxiii conversion into fxiiia at which oxidation was carried out. uv-, ftir-and raman spectroscopy demonstrated that chemical transformation of cyclic, nh, sh and s-s groups mainly determines the oxidation of amino acid residues of pfxiii polypeptide chains. conversion of pfxiii to fxiiia proved to increase protein susceptibility to oxidation in the order: pfxiii < pf-xiii activated by thrombin < pfxiii in the presence of calcium ions < fxiiia. with the aid of massspectrometry it has been demonstrated that oxidation leads to decreasing fxiii-a and fxiii-b coverage both in the forms of zymogen and in the presence of calcium ions. a group of amino acid residues involved in oxidation modification of pfxiii is identified in this study. the oxidation of either cfxiii or cfxiiia has revealed an almost complete loss of enzyme activity caused by dramatic changes in the primary and secondary structure of the proteins detected by the ftir data. taking into account these new findings, it seems reasonable to assume that the inhibitory/carrier fxiii-b subunits can serve as scavengers of ros. hypothetically, this mechanism could help to protect the key amino acid residues of the fxiii-a subunits responsible for the enzymatic function of fxiiia. the study was supported by rfbr, research project no. - - - a. mass spectrometry study was supported by the russian scientific foundation grant no. - - . performance and quality. making microcalorimetry simple with microcal peaq-itc natalia markova , ronan o'brien , mark arsenault microcal, malvern instruments ltd. dynamic interactions involving biomolecules drive and regulate all biological processes. studies of biomolecular interactions are fundamentally important in all areas of life sciences. data provided by isothermal titration calorimetry (itc) enables scientists in academia and industry to directly and quantitatively characterize these interactions in solution. microcal peaq-itc, the latest generation of microcal itc instrumentation, offers a whole range of solutions for addressing current bottlenecks associated with interaction analysis. among the most recognized challenges are the needs to adequately address a broad range of binding affinities and to reliably interpret binding data complicated by the presence of inactive protein fraction or inherent uncertainty in the concentration of a ligand. consistently high performance of microcal peaq-itc enables increased confidence and data resolution when measuring low heats at low or uncertain sample concentrations and complex binding modes. the new microcal peaq-itc analysis software allows for utomated data analysis, minimizing analysis time and user subjectivity in assessing data quality. data quality is determined and advanced fitting performed in a few seconds per experiment allowing for analysis of large data sets of or more experiments in a matter of seconds. glutamine-rich activation domain of transcription factor sp -biochemical activity and structure jun kuwahara , chisana uwatoko , emi hibino , katsumi matsuzaki , masaru hoshino faculty of pharmaceutical sciences, doshisha women's university, graduate school of pharmaceutical sciences, kyoto university transcription factor sp is ubiquitously expressed in a mammalian cell, activates reasonably large subset of mammalian genes, and is involved in the early development of an organism. the protein comprises two glutamine-rich (q-rich) regions (a and b domains) located in its n-terminal half, while three tandem repeats of c h zinc finger motif at its c-terminus binds directly to a gc-rich element (gc box) of dna. in general, q-rich domain is one of the typical motifs found in trans-activation domain of transcription factors together with acidic and proline-rich domains. transcriptional signal of sp are transmitted via interaction between q-rich domains of sp and different classes of nuclear proteins, such as tata-binding protein (tbp) associated factors (tafs) in components of basic transcription factor complexes (tfii). in addition, self-association of sp via q-rich domains is also important for its regulation of transcriptional activity. it has been considered that an sp ! molecule bound to a 'distal' gc-box synergistically interacts with another sp molecule at a 'proximal' binding site. although formation of multimers via q-rich domains seems functionally important for sp , little is known about relevance between biological activity and structural nature of q-rich domains. we analyzed nature of glutaminerich domains of sp by biochemical and physicochemical methods. we found that q-rich domains do not have clear secondary structure whereas they can indicate biochemical activity. detailed analysis of nmr spectra indicated interaction between the domains. the q-rich domains of sp might be one of the intrinsically disordered proteins (idp). chipping away at the yeast proteome: redesigning an e ubiquitin ligase for targeted protein degradation michael hinrichsen , lynne regan one of the central goals of synthetic biology is to exploit biological systems in order to produce compounds of therapeutic or industrial value . often, these efforts are complicated by the many natural biochemical pathways in cells that can compete for the same small molecule precursors. currently, the most common solution is to simply delete the genes coding for the competing enzymes . while such an approach has been successful, it is only applicable to nonessential genes and can produce unintended off-target effects such as decreased cell viability . an alternative strategy is to instead target proteins directly for degradation. using this strategy, scientists would first grow cultures of engineered cells to high densities under permissive conditions (i.e. targeted proteins are stably expressed). then, once sufficient cell density has been reached, enzymes of competing pathways would be rapidly degraded, resulting in the rapid production of high concentrations of the compound of interest. we propose to create such a tool by reengineering the c-terminus of hsp interacting protein (chip), an e ubiquitin ligase. chip recognizes substrate proteins through a short c-terminal peptide tag on target proteins . we have shown that fusing this tag to non-native substrates is sufficient for ubiquitination in vitro (data not published). cellular assays have also been performed in s. cerevisiae, a model organism commonly used in metabolic engineering applications . as a number of native yeast proteins possess c-termini similar to that of chip's native substrates (data not published), it was necessary to develop an orthogonal chip-peptide pair. this was achieved by replacing chip's natural tpr ligand-binding domain with a ligand-binding domain engineered previously in the regan lab . the altered chip construct has been shown to be active both in vitro and in vivo, and produces an altered growth phenotype when targeted against an enzyme involved in uracil biosynthesis. future work will focus on further kinetic characterization of the engineered enzyme, increasing its activity, and introducing the system into a proof of concept synthetic biology application. advances in modern sequencing techniques have resulted in an explosion of genomic data. correctly classifying this new wealth of information can be daunting not only because of the sheer volume of sequence data, but also because the propagation of erroneous and less-than-ideal names and functional characterizations in the current databases gets in the way of functional classification by mere sequence similarity. we are investigating the extent to which protein domain architecture can be utilized to define groups of proteins with similarities in molecular function, and whether we can derive corresponding functional "labels", starting with some of the most common domain architectures found in bacteria. to this end, we have developed an in-house procedure called sparcle ('specific architecture labeling engine') that lets us track and examine specific or sub-family domain architectures, resulting from annotating protein sequences with domain footprints provided by the conserved domain database (cdd), which includes hierarchical classifications for many common domain families. we will discuss how the proteins are grouped into specific architectures, our successes in assigning functional labels, and the major limitations we have encountered to date. while we will be able to assign functional labels to a large fraction of protein models derived from genome sequences, this effort has the added benefit of pointing out insufficient coverage and resolution of the current protein domain model collections that constitute cdd. we will also discuss alternative procedures that utilize pre-computed domain annotation for clustering protein sequences at a level that is well suited for functional labeling. we hope that this preliminary study will help to identify approaches that facilitate rapid and accurate annotation of genomes with a minimum of manual intervention. pegylated amyloid peptide nanocontainer delivery and release system self-assembly of telechelic peg end-capped with hydrophobic dipeptides collagen stimulating effect of peptide amphiphile c -kttks on human fibroblasts self-assembly of palmitoyl lipopeptides used in skin care products bioactive films produced from selfassembling peptide amphiphiles as versatile substrates for tuning cell adhesion and tissue architecture in serum-free conditions influence of elastase on alanine-rich peptide hydrogels interaction between a cationic surfactant-like peptide and lipid vesicles and its relationship to antimicrobial activity self-assembled arginine-coated peptide nanosheets in water toll-like receptor agonist lipopeptides self-assemble into distinct nanostructures approved drugs containing thiols as inhibitors of metallo-ß-lactamases: a strategy to combat multidrug-resistant bacteria references leukotriene a hydrolase -an envolving target. inflammatory diseases -immunopathology, clinical and pharmacological bases the bifunctional enzyme leukotriene-a, hydrolase is an arginine aminopeptidase of high efficiency and specificity lipoxygenase and leukotriene pathways: biochemistry, biology, and roles in disease a critical role for lta h in limiting chronic pulmonary this work was supported by the czech science foundation (project p / / ) and czech academy of sciences the tn antigen-structural simplicity and biological complexity bel b-trefoil: a novel lectin with antineoplastic properties in king bolete (boletus edulis) mushrooms acknowledgements: cynthia leyva-arg€ uelles is supported by a personal grant from conacyt, mexico. this work is supported by conacyt grant ' : and papiit grant in sequence information-based deciphering of biofunctionalities using ism-based techniques has fetched calculation of biological functionalities, designing of biomedical device called computer-aided drug resistance calculator, the understanding of the mechanism of hiv progression to aids [ ], and others. they have compared the efficacies of drugs and vaccines, which formed the basis for the innocentive award (id ) for assessing vaccine potency. conclusions: deciphering biological features without engaging reagents, equipments and animal tissues but biological data such as sequence information is one novel, feasible genotypic hiv-coreceptor tropism prediction with geno pheno [coreceptor]: differences depending on hiv- subtype a reliable phenotype predictor for human immunodeficiency virus type subtype c based on envelope v sequences available: http:// istree.bioprotection.org signal processing-based bioinformatics methods for characterization and identification of bio-functionalities of proteins an empirical framework for binary interactome mapping estimating the size of the human interactome coming to peace with protein complexes? th capri evaluation meeting pj- cabs-dock web server for protein-peptide docking with significant conformational changes and without prior knowledge of the binding site while other docking algorithms require pre-defined localization of the binding site, cabs-dock doesn't require such knowledge. given a protein receptor structure and a peptide sequence (and starting from random conformations and positions of the peptide), cabs-dock performs simulation search for the binding site allowing for full flexibility of the peptide and small fluctuations of the receptor backbone cabs-flex: server for fast simulation of protein structure fluctuations cabs-fold: server for the de novo and consensus-based prediction of protein structure cabs-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site mechanism of folding and binding of an intrinsically disordered protein as revealed by ab initio simulations modeling of protein-peptide interactions using the cabs-dock web server for binding site search and flexible docking cabs-fold: server for the de novo and consensus-based prediction of protein structure cabs-flex: server for fast simulation of protein structure fluctuations aggrescan d (a d): server for prediction of aggregation properties of protein structures cabs-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site staphylococcal pathogenicity island dna packaging system involving cos-site packaging and phage-encoded hnh endonucleases the etiology of asdis unknown, but it is believed that it involves genetic and environmental components. the purpose of this work is to assess the possible involvement of food contaminants, such as mycotoxins, in the etiology of asd. the hypothesis is that the mycotoxins ingested with the diet could bind to proteins and expose the entire organism,including cns, to the negative effects of xenobiotics, in genetically predisposed patients. in this study some possible protein targets for the mycotoxinswere identified to evaluate if the bond between any protein target and the mycotoxin in exam could play a role in asd. twelve mycotoxins were selected (ochratoxin a, gliotoxin, aflatoxin b , aflatoxin b , aflatoxin m , aflatoxin m , aflatoxicol, a-zearalanol, b-zeralanol, zearalenone, deoxynivalenol, patulin),which are contaminants of milk and cereals.for each of these molecules,possible protein targets were searched by a reverse docking approach using the idtargetserver[ ].from the results given by idtarget, human protein targets expressed in the brain or involved in brain diseaseswere selected. subsequently, a direct docking was made using auto-dock . [ ], in orderto verify the strength of the interaction between selected proteins and each mycotoxin, and to identify the mycotoxins' binding site on each of the selected protein. finally, the bond of some mycotoxins to selected protein targets has been experimentally tested. for each mycotoxin, idtarget returned thousands of possible protein targets,and only those with the best binding energy were selected and evaluated. among them, human protein targets that are expressed in the brain or that are involved in cerebral diseases,have been selected; moreover the protein targets that were not human but that idtargetselected for five or more mycotoxins, were replaced with their human counterparts. at the end of the procedure, nineteen protein targets have been identified for the following direct docking approach. from the docking results, eight proteins have been selected for experimental tests, having a predicted binding energy lower than kcal/mol. finally, the interactions between acetylcholinesterase (ache), b-secretase (bace ) and neuroligin- , x-linked (nlg x) with aflatoxin b , aflatoxin b , gliotoxin, ochratoxin a and deoxynivalenol, were evaluatedusing fluorescence spectroscopy and microscale thermophoresis. these experiments confirmed the presence of an interaction between bace and aflatoxin b idtarget: a web server for identifying protein targets of small chemical molecules with robust scoring functions and a divide-and-conquer docking approach the calculation of spatial structure and "assembling" of the whole protein from the obtained peptide structures were performed by using molecular dynamics of the protein in the fully hydrated -palmitoyl- -oleoyl-sn-glycero- -phosphatidylcholine (popc) [ ]. the obtained structural model may contribute to identification of ul . active sites and elucidation of its mode of action nmr structural studies of membrane proteins acknowledgments: polish national centre for research and development -grant number pl- functional and mechanistic studies of dysferlin, an essential protein in cell membrane repair references moreover the 'm' parameter, which represents the denaturant effect on the protein stability, is cal•mol- for bgla and cal•mol- for bglb albert einstein college of medicine protein structure modeling, protein-protein interaction computational modeling of ini /smarcb and novel insights into its interaction with hiv- integrase savita bhutoria epsteain bar virus, nuclear antigen) . ini /smarcb has no known structural homologues, and its amino-acid sequence yields little insight into its function. a detailed understanding of structure-function relationships is hampered by the lack of structural information for ini . computational methods that model protein/peptide structures with sufficient accuracy to facilitate functional studies have had notable successes. we carried out combination of sequence analysis ab initio structure modeling and dynamics studies of integrase binding domain of ini and found it to be similar to that of phospholipase a activating protein, plaa. structural similarity with this distant protein suggests divergent evolution of the two proteins. the modeled structure sheds light on various protein-protein interactions of ini . by integrating the experimental studies about the binding, we have shown through docking, how a fragment of ini binds to the hiv- in. molecular docking and experimental studies indicated that two proteins bind tightly through charged/polar residues surrounding a hydrophobic cleft. these studies provide first modeled structure of ini /smarcb or any component of the swi/snf complex, and provide structural basis for in-ini interactions. this molecular interpretation of the intermolecular interactions is expected to facilitate design of inhibitors as novel class of anti-hiv- therapeutic agents ): e . with their catalytic activity towards rna substrates, other biological properties have been reported and evolution studies suggest an ancestral host-defence function in vertebrates. indeed, genetic studies confirmed a rapid molecular evolution within the family, a distinctive trait for host defence proteins exposed to a changing pathogen environment. previous studies from our laboratory characterized the wide spectra antimicrobial activity of two highly cationic human rnases: the eosinophil rnase and the skin derived rnase ribonucleases and have antimicrobial function in the human and murine urinary tract structural determinants of the eosinophil cationic protein antimicrobial activity two human host defense ribonucleases against mycobacteria, the eosinophil cationic protein (rnase ) and rnase the regulatory mechanism that we are reporting will contribute to prevent both nad exhaustion and accumulation of the toxic bal. to the best of our knowledge, this is the first report of a novel reversible covalent modification of an aldh enzyme involving its own substrate anna lewandrowska , aldona jeli nska , agnieszka wi sniewska acknowledgement: this research was supported by the inactivation of the fxr gene reduces aqp expression and impairs urine concentrating ability, which leads to a polyuria or urine dilution phenotype. we have previously found that pon -/-mice exhibit a polyuria phenotype and produce twice as much -h urine as their wild type pon / littermates (borowczyk k et al. metabolism and neurotoxicity of homocysteine thiolactone in mice: evidence for a protective role of paraoxonase development and application of novel non-ewald methods for calculating electrostatic interactions in molecular simulations ikuo fukuda , narutoshi kamiya the most time-consuming part of molecular simulation is the calculation of long-range interactions of the particles. in particular, appropriate treatment of the electrostatic interaction is critical, since the simple truncation cannot be used due to the slow decay of the coulombic function. thus, it is highly demanded to calculate the electrostatic interactions with high accuracy and low computational cost. for this purpose we have developed the zero-multipole (zm) summation method [ ]. in this method the artificial periodic boundary conditions are not necessary and the fourier part evaluations are not needed, in contrast to the conventional ewald-based methods. instead, a pairwise function that is suitably redefined from the coulombic function is used with a cutoff scheme. the underling physical idea is simple: (a) in a biological system, a particle conformation for which the electrostatic interactions are well cancelled is more stable than other conformations [ ]; (b) since such well-cancelled conformations are essentially physical, we should clip a subset of such a conformation out of the conformation within an ad-hoc given cutoff sphere and calculate the interactions only from this subset. this idea is realized by a rigid mathematical consideration that leads to the deformation of the coulombic function. the efficiency of the zm method has been validated in applications to fundamental systems sema a) is a protein originally described as an axonal chemorepellent cue involved in many physiological processes ranging from embryonic development to bone homeostasis or immune responses sema a signal transduction requires the formation of a heteromeric complex with neuropilin- (nrp ) and plexina [ ]. in addition, sema a interaction with nrp is modulated by the furin protease cleavage at its c-terminal basic domain this c-terminal basic domain has also been suggested to mediate the binding to glycosaminoglycans (gags), an association that locates sema a to perineuronal nets and enhances its function in restricting neuronal plasticity and inhibiting axonal regeneration in the central nervous system two peptides corresponding to the highly positively charged regions on the domain were shown to bind to immobilized heparin by surface plasmon resonance (spr) and the affinity dramatically increased when the complete domain was assayed. the binding was confirmed by nuclear magnetic resonance (nmr) and circular dichroism (cd) the conserved cysteine within this motif, necessary for the dimerization of sema a [ ], is also critical for the helix formation. in addition, fluorescence spectroscopy studies showed that the n-terminal region also has a contribution in the binding to gags. we acknowledge the financial support from the european union seventh framework programme (fp / - ) under the project vision semaphorin a: a new player in bone remodeling neuropilins lock secreted semaphorins onto plexins in a ternary signaling complex furin processing of semaphorin f determines its anti-angiogenic activity by regulating direct binding and competition for neuropilin semaphorin a displays a punctate distribution on the surface of neuronal cells and interacts with proteoglycans in the extracellular matrix semaphorin a binds to the perineuronal nets via chondroitin sulfate type e motifs in rodent brains mechanistic basis for the potent anti-angiogenic activity of semaphorin f. biochemistry collapsin- covalently dimerizes, and dimerization is necessary for collapsing activity prior attempts to create functionally relevant groupings of proteins in the crotonase superfamily suggest that this superfamily is difficult to cluster functionally due in part to the functionally diverse nature of the protein superfamily. we have developed two novel procedures to combat this difficulty: tulip (two-level iterative clustering process), a process that utilizes structural information from active sites to cluster protein structures into hypothesized functional groupings, and misst (multi-level iterative sequence searching technique), a process that uses the protein groupings created in tulip as a starting point for iterative genbank searches and further clustering after each search. through these two methods, the total coverage of the crotonase superfamily has increased, and the generated groups contain proteins from subgroups and families that did not have a structural representative. novel hypothesized functional protein groupings have been created, most notably for a large number of proteins that lack annotation data at the subgroup or family level, and for proteins of the enoyl-coa hydratase family fernandes , teresa sorbo , ivan duka , lia christina appold , marianne ilbert , fabian kiessling cnrs, umr , ucibio-requimte, faculdade de ciências e tecnologia e-selectin is a cell-adhesion molecule induced on the surface of endothelial cells in response to cytokines. its upregulation has been reported in many disorders, including inflammatory and cardiovascular diseases, tumor angiogenesis and metastasis [ ]. this profile suggests e-selectin as a promising target to develop molecular imaging probes for the detection of these diseases cyt c with equivalent of fld in media mimicking the cytoplasm. these include % polyacrylamide gel, g/l bovine serum albumin or polyvinylpyrrolidone , and buffer alone for comparison. electrostatic surface representations of the proteins are shown with their in-cell spectrum pl- a search for anti-melioidosis drug candidates targeted to d-glycero-d-manno-heptose- , -bisphosphate phosphatase from burkholderia pseudomallei bpgmhb converts dglycero-d-manno-heptose- b, -bisphosphate to d-glycero-d-manno-heptose- b-phosphate. this is the third step of the biosynthesis pathway of ndp-heptose responsible for a pleiotropic phenotype. therefore, this biosynthesis pathway is the target for inhibitors increasing the membrane permeability of gram-negative pathogens or adjuvants synergistically working with known antibiotics. to find inhibitors of bpgmhb, we performed homology modeling of bpgmhb and in-silico virtual screening with zinc, a free database of commerciallyavailable compounds. tens of thousands of chemical compounds were docked into the active site of bpgmhb. a number of putative bpgmhb binding compounds better than d-glycero-d-manno-heptose- b, -bisphosphate were found using surflex-dock included in the sybyl software package crystal structure of dimeric d-glycero-d-manno-heptose- , -bisphosphate phosphatase from burkholderia thailandensis ewha womans university we have solved the crystal structures of d-glycero-d-manno-heptose- , -bisphosphate phosphatase from burkholderia thailandensis (btgmhb) catalyzing the removal of the phosphate at the position of d-glycero-d-manno-heptose- , -bisphosphate. it belongs to the haloacid dehalogenase (had) superfamily with an a/b rossman fold composed of six parallel b-strands sandwiched between two sets of three a-helices it reveals a conventional rossman-like a-b-a sandwich fold with a novel b-sheet topology. its c-terminus is longer than its closest relatives and forms an additional b-strand whereas the shorter c-terminus is random coils in the relatives. interestingly, its core structure is similar to that of enzyme iib(cellobiose) from e. coli (eciib(cel)) transferring a phosphate moiety. in the active site of the closest eceiib(fruc) homologues, a unique motif cxxgxaht comprising a p-loop like architecture including a histidine residue is found. the conserved cysteine on this loop may be thiolated to act as a nucleophile similar to that of eciib(cel). the conserved histidine residue is presumed to accommodate negatively charged phosphate during enzymatic catalysis leonor morgado , kornelius zeth , , , bj€ orn m. burmann , timm maier bama is a b-barrel membrane protein with five periplasmic n-terminal polypeptide transport associated (potra) domains. the bama structure has been determined recently by x-ray crystallography ( , ), however its functional mechanism is not well understood. this mechanism comprises the insertion of substrates from a dynamic, chaperone-bound state into the bacterial outer membrane, and nmr spectroscopy is thus a method of choice for its elucidation we demonstrated that knocked down autophagy by shrna (shatg , shbecn , and shatg ) and chloroquine (cq) could enhance high dose of uvb induced cell death in odc overexpressing hela and mcf- cells. here, we also observed that knocked down odc in odc overexpressing hela and mcf- cells inhibited autophagy and enhanced high dose of uvb radiation. because of atg can regulate cell apoptosis and utophagy. site directed mutagenesis was used to mutant the amino acid which can regulate cell apoptosis and autophagy on atg , respectively in these two odc overexpressing cells. according to the results fish ß-parvalbumin acquires allergenic properties by amyloid assembly using atlantic cod b-parvalbumin (rgad m ) displaying high ige crossreactivity, we have found that formation of amyloid fibers under simulated gastrointestinal conditions accounts for the resistance to acid and neutral proteases, for the presence of membrane active species at gastrointestinal relevant conditions and for the ige-recognition in allergic patient sera. incorporation of the anti-amyloid compound epigallocathequin gallate prevents rgad m fibrillation, facilitates its protease digestion and impairs its recognition by ige. conclusions: rgad m amyloid formation explains its degradation resistance, its facilitated passage across the intestinal epithelial barrier and the epitope architecture as allergen autophagy could degrade the citrullinated and unfolding protein. herein, padi could enhance autophagy in jurkat t cells and lead to a degradation of p and the accumulation of lc -ii. autophagy and apoptosis are two critical mechanisms which participate against cellular stress, cell activation, survival and homeostasis. pad -overexpressed jurkat t cells caused the activation of th cells to increase mrna expression of cytokines, such as il- , il- , il- and tnfa. cytokines provoked caspase expression and led to caspase-mediated cleavage of beclin- which was an important factor of apoptotic signaling. knockdown of bcen rescued cell survival due to the increase of bcl-xl and the decrease of caspase- . we suggested that padi participated in the activated t cell-induced autonomous death through triggering er stress pathway studies on secondary metabolites production and proteins and enzymes of in vitro cultivated artemisia alba turra and relations with some endogenous phytohormones yuliana raynova , krassimira idakieva , vaclav motyka , petre dobrev , yuliana markovska , milka todorova , antoaneta trendafilova , ljuba evstatieva switzerland aim: artemisia alba turra is an essential oil bearing shrub, characterized with great variability of the essential oil profile of wild grown plants, related to genetic, geographic and environmental factors. it was previously established that inhibition of rooting in vitro caused by cytokinin/auxin treatment affected the essential oil profile of the plant and these changes were also related to bioactive endogenous cytokinin levels in vitro ( , ) cytokinin and auxin effect on the terpenoid profile of the essential oil and morphological characteristics of shoot cultures of artemisia alba terpenoid profile of artemisia alba is related to endogenous cytokinins in vitro salivary hnp - are conventionally measured using an enzyme-linked immunosorbent assay (elisa) which does not discriminate between individual hnps due to their structural similarities. considering the biological importance of salivary human neutrophil a-defensin (hnps), there is therefore, a need to develop an analytical method that will discriminate between the defensins. an lc-ms method has been established for the separation and detection of hnp - . the method has been optimised, validated and applied to examine the relative level of hnp - in participants undertaking a circuit resistance training workout. to date, no studies have systematically investigated the effect of acute (min to hours) and chronic (days to weeks) change in salivary adefensins family before and after exercise by lc-esi-ms systems and models calorimetry showed no difference in dissociation constants at these ph values, while the binding stoichiometry is increased . fold. furthermore, the binding stoichiometry varied fold among the two alginates corresponding to their difference in average molecular weight and in addition fold higher binding affinity was found with the high as compared to the low molecular weight alginate. in conclusion, the binding stoichiometry of b-lactoglobulin with alginate increases by a factor that correlates to the average molecular weight of the alginate and also a much higher affinity was found for the high molecular weight alginate. acknowledgements: this work is supported by the danish council for the presence of mucins and other high molecular weight glycoproteins in saliva makes the direct analysis of defensins difficult. the lc-ms method was linear for concentrations of hnp- between . and ng/ml (r . ) with a lod of . ng/ml. inter and intra assay precision was . - %, respectively. saliva sample were clean up by solid phase extraction (spe) and without-solid phase extraction (wspe) . mm internal diameter column in relation to the method transfer . during lc-ms optimisation genome-wide docking database (gwidd) provides the most extensive data repository of structures and models of ppi on a genomic scale. currently, we are expanding the gwidd dataset to , ppi in , organisms, up from , ppi in organisms in the previous release. the ppi data were imported from intact and biogrid databases and were subjected to in-house modeling pipeline. gwidd current implementation contains , experimentally determined complexes, and , sequence homology and , structure homology models of complexes. the user-friendly interface offers flexible organism-specific search with advanced functions for a refined search for one or both proteins. the new gwidd version includes also a new interactive visualization screen that allows to view search results in different residue representations with the emphasis on the ppi interface refolding and activation of recombinant trypsin i from sardine fish (sardinops sagax caerulea) amyloid is detectable in human dental plaque and is produced by both clinical and laboratory strains of s. mutans, further supporting a functional role. s. mutans lacking p demonstrates residual amyloid forming properties, however, a mutant lacking sortase, the transpeptidase which covalently links p and several other proteins to the peptidoglycan cell wall, is defective in cell-associated amyloid-like properties. the objectives of this study were to identify additional amyloid forming proteins of s. mutans and to evaluate the effects of buffering conditions and ph on the ability of the identified proteins to form amyloids. a p -deficient mutant strain was grown to stationary-phase in defined minimal media, and secreted proteins from spent culture supernatants were fractionated by ion exchange chromatography. partially purified protein fractions were tested for binding of the amyloidophilic dyes congo red (cr) and thioflavin t (tht), and for characteristic birefringent properties following staining with cr and visualization under crossed polarizing filters. proteins from fractions that tested positive for amyloid-like material were separated by sds page, and identified by lc/ms. these included wapa, gbpa, gbpb, smu_ c and smu_ c. recombinant proteins were expressed in escherichia coli, and purified for confirmation and characterization of individual amyloidogenic properties in vitro. recombinant wapa and smu_ c displayed all the biophysical characteristics of amyloid, including visualization of fibrillar aggregates when viewed by transmission electron microscopy. in contrast, gbpa and smu_ c produced amorphous aggregates. wapa and smu_ c form amyloid at different ph, smu_ c under acidic conditions and wapa under neutral to basic conditions. this suggests that the prevailing environmental ph may represent different in vivo triggers for amyloid fibrillization of different s like other small gtpases, the activity of rheb is dictated by its guanine nucleotide binding states: it is active in its guanosine -triphosphate (gtp) bound form and inactive in the guanosine diphosphate (gdp)-bound form. rheb proteins play critical roles in regulating growth and cell cycle, and this effect is due to its role in regulating the insulin/tor/s k signaling pathway rheb interacts directly with fkbp and prevents its association with mtor in a gtp-dependent manner. moreover, fkbp bound to gtp-g-s, a nonhydrolyzable gtp analogon, has a much higher binding affinity for rheb than the gdp-bound form the second study contradicted both studies, since they could not detect any interaction between rheb and fkbp [ ]. to clarify whether there is an interaction and if it is nucleotide dependent, nmr monitored interaction studies were performed employing a c-terminal truncated construct of human rheb ( - rhebdct) that cannot be farnesylated and the biochemically defined binding region on fkpb (fkbp -like fkbp -bd). based on our data rhebdct -gdp does not significantly interact with fkbp -bp. n-fkbp -bd titrated , we observed a weak interaction between rhebdct bound to a gtpanalogon (gppnhp) and fkbp -bd. mapping of the observed spectral changes on the structure of rheb-gtp suggests that fkbp targets the switch region, loop - and the neighboring b-sheet region. we further analyzed the backbone dynamics of rhebdct -gdp and -gppnhp using n relaxtion data (t , t and heteronuclear noe) .based on these data the phosphorylation loop, the switch regions and the loop around residues - show increased backbone dynamics that modulated by the nucleotide binding international centre for genetic engineering and biotechnology tdp- is an rna processing protein that can form inclusions of debatable nature implicated in neurodegenerative diseases. within the putative aggregation domain, repeats of residues - can recruit endogenous tdp- into aggregates inside cells . recently, we showed that a coil to b-hairpin transition in a short peptide corresponding to tdp- residues - enables oligomerization . we have used a broad battery of biophysical experiments, including chromophore and antibody binding, electron microscopy (em), circular dichroism (cd), solution and solid-state nmr, and x-ray to shed light on the nature of these aggregates. based on these findings, structural models for tdp- ( - ) oligomers have been constructed, refined, verified, and analyzed using computational methods, ranging from docking and molecular dynamics simulations to semiempirical quantum mechanics calculations. interestingly, tdp- ( - ) b-hairpins assemble into a novel parallel b-turn configuration showing crossb spine, cooperative h-bonding and tight side chain packing cellular model of tar dna-binding protein (tdp- ) aggregation based on its c-terminal gln/asn-rich region structural characterization of the minimal segment of tdp- competent for aggregation structural evidence of amyloid fibril formation in the putative aggregation domain of tdp- kyungpook national university scolopendin , aglqfpvgrigrllrk, is a -mer peptide derived from the centipede scolopendra subspinipes mutilans. to investigate its property against fungal and bacterial pathogens, antimicrobial tests were performed. we observed that this peptide exhibited antimicrobial activity in a salt-dependent manner and showed no hemolysis. the circular dichroism (cd) analysis observed that a-helical structure properties. we determined the mechanism(s) of action using flow cytometry and investigated the release of potassium. the results showed that the microbial membrane in escherichia coli o and candida albicans was permeabilized with loss of potassium ions. additionally, the bis-( , -dibutylbarbituric acid) trimethine oxonol [dibac ( )] and , '-dipropylthiacarbocyanine iodide [disc ( )] assay showed membrane depolarization. using calcein-encapsulating giant unilamellar vesicles (guvs) and fitc-dextran containing large unilamellar vesicles (luvs), scolopendin disrupted the cell membrane and the damage size is between . to . nm against composition of microbial plasma membrane of e. coli and c. albicans. thus, we demonstrated that a cationic antimicrobial peptide, scolopendin , possesses broad-spectrum antimicrobial effects that formed pore on the cell membrane. structural and functional investigation of the far c-terminal domain (ctd) of the bifunctional enzyme trai using nmr spectroscopy protein structural biology structural and functional investigation of the far c-terminal domain (ctd) of the bifunctional enzyme trai using nmr spectroscopy b.krishna chaitanya, evelyne schrank and klaus zangger institute of chemistry/organic and bioorganic chemistry university of graz, austria corresponding author email id: krishna.bhattiprolu@uni-graz.at bacterial conjugation is a complex process for the horizontal transfer of single stranded dna from one cell to another. this mechanism also leads, for example, to the spread of antibiotic resistance genes and virulence factors among bacterial species. multi-protein complexes formed at the origin of transfer (orit) region of dna and at the cytoplasmic membrane of the bacterial cell, initiate this process. inside the membrane, the relaxosome identifies the single strand for transfer in a plasmid dna, relaxes and unwinds it, whereas the transferosome is involved in pilus formation (type iv secretion system) and transferring the gene through the cytoplasmic membrane. these events take place in the donor bacterial cell along with several other auxiliary proteins [ ] the bifunctional enzyme trai of plasmid r plays a crucial role in the relaxosome activity, as it contains both a relaxase and helicase domain. to exert its functions on dna, trai works in close co-ordination with other relaxosome proteins like tray, tram and the integration host factor. trai is a residual protein and contains major domains: n-terminal relaxase domain, a central helicase domain and a c terminal domain (ctd). the structure of the c-terminal domain until residue has been solved by crystallography, while the structure and function of the remaining residues remained undetermined [ ]. there are saxs models and crystallographic structures for different parts of trai and also for the full length protein. prediction of cleavage specificity in hcv ns / a serine protease and adv cysteine protease systems by biased sequence search threading gonca ozdemir isik , a.nevra ozer department of bioengineering,faculty of engineering,marmara university proteases are enzymes which recognize specific substrate sequences and catalyze the hydrolysis of designated peptide bonds to activate or degrade them. due to the biological importance of proteases, it is particularly important to identify the recognition and binding mechanisms of protease-substrate complex structures in drug development studies. the assessment of substrate specificity in protease systems is crucial, where interpreting the adaptability of substrate residue positions can be useful in understanding how inhibitors might best fit within the substrate binding sites and aid in the design of potent selective inhibitors. substrate specificity is generally determined by the amino acid profile, structural features and distinct molecular interactions. besides experimental methods, computational tools for prediction of natural substrate cleavage sites, such as threading, have emerged as useful alternative approaches to provide valuable insights into complex enzyme-substrate interactions. in this work, the substrate variability and substrate specificity of the hepatitis c virus (hcv) ns / a serine protease and the adenovirus (adv ) cysteine protease was investigated by the biased sequence search threading (bsst) methodology. using available crystal structures of the proteases, the template structures for the substrate-bound proteases were created in silico by performing various peptide building and docking procedures followed by energy minimization and molecular dynamics (md) simulations. bsst was performed starting with known binding, nonbinding and some random peptide sequences that were threaded onto the template complex structures, and low energy sequences were searched using lowresolution knowledge-based potentials. then, target sequences of yet unidentified potential substrates were predicted by statistical probability approaches applied on the low energy sequences generated. the results show that the majority of the predicted substrate positions correspond to the natural substrate sequences with conserved amino acid preferences, while some positions exhibit variability. for ns / a serine protease cleavage, the significant selection for pro at p and cys at p positions is zearalenone is a mycotoxin produced by fusarium graminearum and related fusarium species. f. graminearum is a powerful plant pathogen and infects major crop plants around the world. acute toxicity of zearalenone is low, but due to its structural similarity to b-estradiol it has binding affinity to the estrogen receptor, which results in interference with hormonal balance. typical effects seen in animals include symptoms like hyperestrogenism and reproductive disorders (reduced fertility, reduced litter size or swelling of uterus and vulva). to reduce the risk for human and animal health posed by the ingestion of contaminated food or feed different decontamination strategies have been studied, including biotransformation. today many microorganisms are known to degrade zearalenone, but for most of them the degradation pathway and formed metabolites remained unknown, hence it is unknown if this degradation also means detoxification. only for the fungal strains trichosporon mycotoxinivorans and gliocladium roseum zen degradation has been studied in detail and loss of estrogenicity of reaction products has been confirmed. we screened for, and isolated zearalenone degrading bacteria from soil samples. the most promising new bacterial isolate was taxonomically assigned to the species rhodococcus erythropolis and designated pfa d - . the zearalenone catabolism pathway of pfa d - was found to be identical as known from g. roseum. the primary reaction product, hydrolysed zearalenone, has so far only been postulated in g. roseum. we prepared hydrolysed zearalenone by preparative hplc and showed loss of estrogenicity in assays with the breast cancer cell line mcf and the estrogen reporter yeast strain yzhb . a genomic library was prepared and screened in zearalenone degradation deficient r. erythropolis pr . the gene encoding zearalenone hydrolase was found and named zena. the hydrolase was identified as member of the a/b-hydrolase family and named zena. it was cloned, recombinantly expressed in e. coli and purified by x his-tag mediated immobilised metal affinity chromatography. activity of his-tagged and untagged enzyme zena was compared in cleared lysate and zena was purified for enzyme characterisation. the influence of ph and temperature on enzyme activity and stability was evaluated and kinetic parameters were determined. a new biding site for snake venom c-type lectins?maria cristina nonato costa , ricardo augusto pereira de p adua , marco aurelio sartim , suely vilela sampaio university of são paulo, fcfrp c-type lectins are proteins that bind different glycan molecules by interactions with a calcium atom present in a carbohydrate recognition domain (crd). many organisms (plants, bacteria, virus and animals) use these proteins in various biological events like lymphocyte adhesion, erythrocyte agglutination and extracellular matrix organization. the c-type lectin fold is plastic and possible for about different sequences, what promoted its adaptation to diverse functions, similarly to the observed for the immunoglobulin fold ( - sequences). it is comprised of about - amino acid residues that folds in two fourstranded b sheets sandwiched by two alpha helices. interestingly, c-type lectins present in snake venoms are possible anti-cancer agents since they are toxic to cancer cells and inhibit the adhesion and proliferation of various cancer cell lines. therefore, we have purified a lactose binding c-type lectin from the venom of bothrops jararacussu (bjcul) to study its structure and binding properties to different sugars. bjcul crystals were obtained by vapor diffusion and the structure solved by x-ray crystallography to . Å resolution. bjcul structure is a decamer formed by a pseudo fivefold axis rotation of a dimer hold by a disulfide bond. each monomer binds a calcium atom and possibly another metal at a second and opposed binding site.