key: cord-0910624-k4eetalp authors: Cármenes, R. S.; Freije, J. P.; Molina, M. M.; Martín, J. M. title: Predict7, a program for protein structure prediction date: 1989-03-15 journal: Biochemical and Biophysical Research Communications DOI: 10.1016/0006-291x(89)90049-1 sha: 0fa2750b5605f80e81064c6442a823ca3bf65f85 doc_id: 910624 cord_uid: k4eetalp Abstract We describe a program for protein sequence analysis which runs in IBM PC computers. Protein sequences are loaded from files in Mount-Conrad and Lipman-Pearson format. Seven features are analyzed: hydrophilicity, hydropathy, surface probability, side chain flexibility, antigenicity, secondary structure and N-glycosylation sites. Numeric results can be shown, printed or stored in files exportable to other programs. Graphics of up to four predictions can be displayed on the screen, printed out or plotted, with several definable options. This program has been designed to be fast, user-friendly and to be shared with the scientific community. Once the program is called, and a short description of the available predictions has been displayed, the user is requested to enter the sequence file-name. The sequence is then loaded and displayed on the screen together with its total length. Up to 1800 amino acid residues are accepted for analysis, which is far enough to deal with most protein sequences. Then the user is asked to enter the window size for hydrophilicity-hydropathyantigenicity calculations and the Garnier's decision constant for a-helix and l3-sheet prediction. A default window size value of 6 is suggested, according to the recommendations of Hopp (14) . To make the program easier to use and as an option, the sequence file-name and window size can be specified as parameters following the program name when calling it. In the example we shall examine later, this would be done by typing PREDICT7 IMP.AA 6. Once the sequence has been loaded and the constants defined, calculations of the seven predictions will start, taking about 3 seconds for a typical 250 residue sequence in an IBM AT computer. After calculations have finished, the main menu is displayed. From it and its subnenus, many options are available covering several aspects: -Kind of output, Both numeric results and graphics can be obtained in various ways. They can be stored in a file for later use, displayed on the screen, printed out, or plotted using HP-compatible plotters. -Graphical options. The four predictions to be shown, their order, and the region of the sequence (the total length by default) can be defined. The user can, as well, choose whether to write the scales, ticks, zero lines or axis 1abel:s. All are shown by default, but if a different set of definitions is frequently used, it can be saved in a file called PREDICT'I.DAT. -Standard serial port settings. The plotter is connected to the computer through a RS232 interface. Port number, bauds, data bits, stop bits, and parity checking bit have to be set to the appropriate values. This can be easily performed through a straight forward submenu and the new settings saved as before. Single keystrokes allow shifting to submenus or options. The previous menu can be reached at any time by simply pressing the escape key (Esc). When data have 'been saved in a file, this is reminded to the user as exiting the program. Much care has been taken in designing the user interface in order to make the program as friendly as possible so that it can be used without reference to external instructions. Simple on-line help screens are available from any menu pressing the Fl key. (b) Using the program As an example, we have run the program using the sequence of one of the proteins we are currently investigating. This is the matrix protein of There is a number of protein structure prediction programs available. Some are integrated in complex commercial DNA/protein analysis packages, while others are freely available to the scientific community. These later programs have shown its usefulness, but have three major shortages. They usually do not have any graphical capacity, which is essential for a proper understanding of the predicted structures, they can not simultaneously analyze more than one protein structure feature, and they are not very fast. Although some of the commercial programs do overcome these deficiencies they are not freely available to the academic community, are usually expensive and sometimes need mainframe computers to be implemented. From these considerations, we believe PREDICT7 can be useful to other investigators in this field. (d) Availability of the program PREDICT7 is available to anyone for non-commercial use upon request by sending a 5% inch formatted blank diskette to the authors. A copy of the program can also be obtained via EARN/BITNET by requesting it to CMSMD11@Eowov11. We would like to express our gratitude to F.Parra and C.L.Ct.in for the interest shown during the development of this work, and to J.Riera for his technical advice. Proc.Natl.Acad.Sci.USA