key: cord-0961911-dz5ttt74
authors: Mahtarin, Rumana; Islam, Shafiqul; Islam, Md. Jahirul; Ullah, M Obayed; Ali, Md Ackas; Halim, Mohammad A.
title: Structure and dynamics of membrane protein in SARS-CoV-2
date: 2020-12-22
journal: Journal of biomolecular structure & dynamics
DOI: 10.1080/07391102.2020.1861983
sha: 3e0b8182f936d6917fbe5244b0a2bf53ce43e542
doc_id: 961911
cord_uid: dz5ttt74

SARS-CoV-2 membrane (M) protein performs a variety of critical functions in virus infection cycle. However, the expression and purification of membrane protein structure is difficult despite tremendous progress. In this study, the 3 D structure is modeled followed by intensive validation and molecular dynamics simulation. The lack of suitable homologous templates (>30% sequence identities) leads us to construct the membrane protein models using template-free modeling (de novo or ab initio) approach with Robetta and trRosetta servers. Comparing with other model structures, it is evident that trRosetta (TM-score: 0.64; TM region RMSD: 2 Å) can provide the best model than Robetta (TM-score: 0.61; TM region RMSD: 3.3 Å) and I-TASSER (TM-score: 0.45; TM region RMSD: 6.5 Å). 100 ns molecular dynamics simulations are performed on the model structures by incorporating membrane environment. Moreover, secondary structure elements and principal component analysis (PCA) have also been performed on MD simulation data. Finally, trRosetta model is utilized for interpretation and visualization of interacting residues during protein-protein interactions. The common interacting residues including Phe103, Arg107, Met109, Trp110, Arg131, and Glu135 in the C-terminal domain of M protein are identified in membrane-spike and membrane-nucleocapsid protein complexes. The active site residues are also predicted for potential drug and peptide binding. Overall, this study might be helpful to design drugs and peptides against the modeled membrane protein of SARS-CoV-2 to accelerate further investigation. Communicated by Ramaswamy H. Sarma

Severe acute respiratory syndrome coronavirus 2 (SARS CoV-2) with unpredictable and fast spreading nature has imposed the most devastating global impact in recent times. Thus, the pandemic has created a huge catastrophe for human life. Hence, targeting crucial viral proteins and exploring their structural features are the ongoing strategies to design effective vaccines or therapeutics. Three dimensional structure of a protein provides details insights from structure to function relationship which aids to the structure-based drug as well as vaccine design (Gromiha, 2010) . The structural proteins' interactions in SARS-CoV-2 may play critical role for the association of viral particles during viral replication and assembly (Li et al., 2020) .

Membrane protein (M) is one of the important functional components that plays a significant role in maintaining virion size and shape. It assists to assemble all other structural proteins including spike (S), envelope (E), and nucleocapsid (N) and participates in the budding process (Neuman et al., 2011; Schoeman & Fielding, 2019) . Coronaviruses form virus-like particle (VLP) via the interaction of M and E or M and N proteins, and the collective manifestation of M, N and E is mandatory for well-organized VLP production as well as its trafficking and release (Siu et al., 2008) . In addition, M-S proteins' interaction assist incorporation of S protein into virion. The M protein also collaborates with the S protein during the cell attachment and entry (Naskalska et al., 2019) and it seems that these crucial interaction may facilitate viral transmission. Moreover, viral M protein, like other viral proteins, exhibits self-association as well as interaction with other accessory and non-structural proteins. These protein-protein interactions may play a significant role in viral structural protein processing, modification, and trafficking for viral particle assembly and egress (Li et al., 2020) . Thus, the critical network of SARS-CoV-2 M protein with its intra-viral proteins shapes the basis of targeting M protein as a target for structure-based drug design.

Even though the M protein is the indispensable part of its SARS-CoV-2 virion, the big bottleneck in designing structurebased drug is lack of its three-dimensional structure. So far there is no experimentally resolved structure as well as suitable template available for even homology modeling of M protein. Therefore, template-free modeling (de novo or ab initio) approach appears to be the most suitable to model M protein as no known structural homolog is available. Mostly, this approach applies physics based principles and energy terms to model proteins (Dhingra et al., 2020; Khor et al., 2015) . The template-free modeling has exhibited drastic improvement of the accuracy for residue-residue contact distance prediction in the recent years. The actual prediction of inter-residue contacts and distances is a major intermediary step to predict protein three-dimensional (3D) structure from sequence (Hou et al., 2019) .

In this study, we have utilized different modelling protocols including I-TASSER, Robetta, trRosetta, SWISS-MODEL to assess and compare the SARS-CoV-2 M protein model structures. Among theses protocols, I-TASSER applies multiple threading alignment approaches to build up the full-length protein model structures (Yang et al., 2015) . While, Robetta protocol runs automated tools, and sequences submitted to the server are parsed into putative domains and structural models are assembled through either comparative modeling or de novo structure prediction approaches (Kim et al., 2004) . trRosetta predicts interresidue orientations and distances from co-evolutionary data applying deep knowledge, significantly improves protein structure prediction . Moreover, SWISS-MODEL looks for template against the template library (SMTL) applying BLAST and HHBlits. Then model is built using ProMod3 considering the target-template alignment. QMEAN scoring function assess the global and per-residue model quality to quantify modeling errors (Waterhouse et al., 2018) . The predicted structures are verified via ERRAT, RAMPAGE, PROCHECK, ProSA-web (Wiederstein & Sippl, 2007) , and QMEANBrane web servers (Studer et al., 2014; Zobayer & Hossain, 2018) as validation and quality assessment are the crucial task for three-dimensional structures (Pra znikar et al., 2019) . In addition, all model structures are subjected to molecular dynamics simulation by incorporating the membrane environment.

The SARS-CoV-2 membrane (M) protein sequence (YP_009724393.1) was retrieved from NCBI Reference Sequence (NC_045512.2) (Pruitt, Tatusova, & Maglott, 2007) and compared with SARS-CoV M protein sequence (UniProtKB-P59596) via BioEdit ClustalW application. The domain orientation of SARS-CoV-2 M protein was visualized based on UniProtKB-P0DTC5.

To analyze physiochemical parameters ExPASy's ProtParam (Gasteiger et al., 2005) tool was employed to calculate theoretical pI (Isoelectric point), instability index (II), aliphatic index (AI), grand average of hydropathicity for SARS-CoV-2 M protein. Furthermore, secondary structural properties of the protein were evaluated via self-optimized prediction method with alignment (SOPMA) (Dash et al., 2016) .

BLAST (blastp) and SWISS-MODEL were searched to find the suitable template for SARS-CoV-2 M protein. The template library of SWISS-MODEL (SMTL) applied BLAST and HHBlits against the primary amino acid sequence in the library (Waterhouse et al., 2018) . The 20 distant homologs were identified as probable template structures (Dilly et al., 2020) .

The SARS-CoV-2 membrane (M) protein reference sequence (YP_009724393.1) was applied for template-free (de novo or ab initio) prediction of the 3 D structures employing Robetta and trRosetta servers. These model structures were also compared with the model generated by I-TASSER server. To assess quality of the predicted models, various validation servers including PROCHECK (Laskowski et al., 1993) , RAMPAGE (Begum et al., 2019; Lovell et al., 2003) , ERRAT (Colovos & Yeates, 1993) , ProSA-web (Wiederstein & Sippl, 2007) , and QMEANBrane (Studer et al., 2014) were used. Later, TM-align algorithm was also employed to identify the best model structures based on TM-score (Zhang & Skolnick, 2005) .

Membrane proteins were refined and energy minimized by YASARA program (Land & Humble, 2018) . For that purpose, the membrane was attached for all model structures. YASARA scanned for hydrophobic residues among the secondary structure elements of the protein that could be part of probable transmembrane region. YASARA displayed the suggested membrane embedding and built a membrane of the required size (69.2 Å Â 7.3 Å) with the lipid composition of phosphatidyl-ethanolamine. An equilibration simulation was last for 250 ps. The membrane was stabilized to adapt the protein and maintain the right density during the equilibration phase.

2.6. Molecular dynamics (MD) simulation YASARA Dynamics (Krieger et al., 2004) were used to perform the molecular dynamics simulation where AMBER14 force field (Dickson et al., 2014) was considered for all calculations. During the simulation, Berendsen thermostat process regulated the simulation temperature. The particle Mesh Ewald algorithm was involved for long-range electrostatic interactions. A periodic boundary condition was elected during the simulation of membrane embedded protein. The environment was equilibrated with 0.9% NaCl and water solvent, at 298 K temperature. The time step was about 1.25 fs to carry out 100 ns MD simulation and 1000 snapshots were collected at 100 ps time interval. After MD simulation, different data including root mean square deviation (RMSD), root mean square fluctuation (RMSF), solvent-accessible surface area (SASA), radius of gyration, total number of hydrogen bonds, helix, sheet, turn, and coil values were collected from MD simulations, according to previously published data analysis protocols M. J. Islam et al., 2019; Junaid et al., 2019; Khan et al., 2017; Shahinozzaman et al., 2020) .

MD simulation data were utilized for principal component analysis (PCA) to explore the structural and energy fluctuations among model M protein structures. The existent variability in the MD trajectory was observed by different multivariate energy factors in the low-dimensional space (De Jong, 1990; Wold et al., 1987) . The centering and scaling were executed for data pre-processing (Ahmed, Mahtarin, et al., 2020; Chowdhury et al., 2020) . In the analysis, final 90 ns MD trajectories were utilized to reveal the variations among the model structures. The PCA model is reflected by the following equation:

X ¼ T k P T k þ E where X matrix expresses multivariate factors into the resultant of two new matrices, i.e. T k and P k ; T k represents matrix of scores which relates the samples; P k , matrix of loadings correlates the variables, k is the number of factors presented in the model and E indicates the matrix of residuals. The exploration of trajectory was performed through R (Peng, 2015) , RStudio (Rstudio Team, 2019) and internal codes. The PCA plots were originated using the R package ggplot2 (Wickham, 2009 (Kozakov et al., 2017) . The best poses were considered and visualized as the protein-protein complexes. The interacting residues among the protein complexes were exhibited by PDBsum's interaction plots (Laskowski et al., 2018) . The active site residues were also predicted as the probable drug binding site by CASTp web server (Wei Tian et al., 2018) . We also retrieved protein-protein interactions and interactors for SARS-CoV-2 M protein (UniProtKB-P0DTC5) from IntAct Molecular Interaction Database (Aranda et al., 2010) . Then the network was visualized using Cytoscape (version 3.8.0) (Cline et al., 2007) .

The membrane protein sequence (YP_009724393.1) of SARS-CoV-2 has shown, sequence identities are 90.5% and sequence similarities are about 96.40% compared with SARS-CoV M protein sequence (UniProtKB-P59596). The alignment is shown in Figure 1 

The analysis of physicochemical parameters from ExPASy's ProtParam has revealed that the M protein of SARS-CoV-2 has the isoelectric point 9.51, instability index 39.14, aliphatic index 120.86, grand average of hydrophobicity 0.446, and also has more positively charged residues (21) than negativecharged (13) amino acids. Moreover, the amino acids' number and percentage of composition in the M protein sequence has been shown in (Table 1) , where, the number and percentage (15.8%) of Leu residue are the highest among all residues. The annotated plots for amino acid types are visualized ( Figure 1b) . Moreover, the properties for secondary structure of the protein with the number of residues and percentages are also displayed via a self-optimized prediction method with alignment (SOPMA) ( Table 2 ).

We have searched for suitable template through blastP suite against Protein Data Bank and SWISS-MODEL against the primary amino acid sequence in the library (SMTL), however, only top two templates (<30% sequence identity) for PDB ID:5CTG (Bidirectional sugar transporter SWEET2b) and PDB ID:6XDC (SARS-CoV-2 Protein 3a) are found. The 14.29% sequence identity has shown by bidirectional sugar transporter SWEET2b, which looks like almost transmembrane region (residues 74-109) and 15.63% sequence identity has revealed by SARS-CoV-2 Protein 3a, which looks like C-terminal region (residues 104-200) in Figure S1a and S1b. The alignment among the targets and models sequences has been displayed in Figure S2a and S2b.

Due to the unavailability of experimentally determined close homologs (>30% sequence identity), template-based modeling was not feasible for the membrane protein with the algorithms of MEDELLER, i-membrane, Memoir, and MODELLER. Therefore, SARS-CoV-2 M protein full-length model structure has been predicted through template-free modeling (de novo or ab initio) approach from Robetta and trRosetta servers. Each server has provided five model structures. In this study, the model structures are compared with the model generated by I-TASSER server (Figure 2b-d) .

Apparently, the models from Robetta and trRosetta are better than I-TASSER model considering the construction of their domain regions. The accuracy of the models is determined by ERRAT, ProSA web, QMEANBrane, RAMPAGE, and PROCHECK Ramachandran plot. In Table 3 , validation scores suggest that model 4 for Robetta and model 5 from trRosetta are the best models among the M protein structures. ERRAT server identifies incorrect regions of protein structures in random distributions of atoms, which can be differentiated from correct distributions. It has presented the scores in the range between 49.296-96.040% for the M protein models from I-TASSER, Robetta, and trRosetta ( Figure S3 ). While RAMPAGE validates 3 D models according to geometry and deviation. It represents scores between 47.7-100% where 99.5 and 98.2% displayed by Robetta and trRosetta best models ( Figure S4 ). Further, PROCHECK server has assessed the stereochemical quality of protein structures considering residue by residue and overall structural geometry. It exhibits results in the range from 35-96% where the best models from Robetta and trRosetta manifest 94 and 96% in the most favored regions ( Figure S5 ). It is observed that I-TASSER has presented the lowest scores among the models of servers. However, the structural analysis from ProSA web has presented the z-score: À5.21, À4.2, and 4.11 for I-TASSER, Robetta, and trRosetta, respectively ( Figure S6 ). The protein models are close to experimentally determined native conformers (NMR spectroscopy: dark blue). But it is difficult to identify a better model from this analysis as there is no previously resolved membrane protein structure for CoVs. Finally, membrane protein model assessment via QMEANBrane has revealed that tr-Rosetta model is properly embedded with membrane compared to Robetta and I-TASSER model proteins ( Figure S7 ), thus tr-Rosetta model fulfills the criteria of membrane protein. This assessment has played an important role to decide about a better model of M protein. The local quality estimation by QMEANBrane usually exhibit scores in range [0,1] for good models. In that case, only trRosetta shows score 1 for proper membrane embedded scenario, whereas Robetta model exposes score nearby 1 but I-TASSER showed the lowest score among the models. Later, the full model structures are also aligned with the top models from SWISS-MODEL by TM-align server, where TM region (74-109) amino acid residues are aligned relatively better with trRosetta (TM-score: 0.64; RMSD 2 Å) model compared to Robetta (TMscore: 0.61; RMSD 3.3 Å) and I-TASSER models (TM-score: 0.45; RMSD 6.5 Å) (Figure 2e-g).

Currently, Rosetta is claimed to be the most successful template-free method in the CASP experiments (Lee et al., 2017; Kelm et al., 2014; Das & Baker, 2008) . The deep learning-based prediction of inter-residue orientations, distances, and the improvement of a constrained optimization by Rosetta, can generate more accurate models for some template-free targets (Hou et al., 2019; Yang et al., 2020) . Hence, considering all the perspectives, trRosetta model is the best model M protein structure in comparison with Robetta and I-TASSER model.

We have performed 100 ns MD simulation to evaluate the probable conformational changes within each model structure from I-TASSER, Robetta, and trRosetta of M protein. RMSD values of a-carbon are investigated (Figure 3a) . The higher average RMSD values of the Ca atoms are found for trRosetta model ($13.34 Å), which is followed by I-TASSER model ($7.68 Å) and Robetta model ($3.98 Å) respectively. There is no higher and lower fluctuation observed for Robetta model and I-TASSER model over the simulation time, which suggests that these models are likely to be stable in an aqueous environment. On the other hand, trRosetta model shows significant deviation after 5 ns until 66 ns, where it begins to deviate largely. After that, fluctuation remains stable until the end of the simulation. High RMSD values are commonly observed in multidomain proteins where hinge motions produce relative movements of domains as rigid bodies (Lesk & Chothia, 1984; Monzon et al., 2017) . When MD snapshots are analyzed, such changes are clearly observed in the C-terminal domain ( Figure 4c) . Moreover, the radius of gyration (Rg) of all trajectories is investigated to identify the degree of protein compactness. The Robetta and I-TASSER model has showed a similar pattern with lower Rg values compare to trRosetta model, indicating that compactness induced in the protein, as shown in Figure 3 (b). Although trRosetta model exhibited distinct pattern with higher Rg values, the fluctuation shows a stable trend along the time. The SASA is calculated for all model structure and is depicted in Figure  3 (c). The most prominent downtrends in the SASA have been observed in case of Robetta and I-TASSER model compare to trRosetta model, indicating that expansion of protein volume is lower than trRosetta model. The number of intramolecular hydrogen bonds is evaluated for deciphering the structural stability of the protein and plotted concerning the time (Figure 3d ). The trRosetta ($344) and I-TASSER ($321) models have showed distinct pattern in the total number of H bonds over the simulation period while Robetta model showed the lowest number of H-bonds. RMSF calculation shows higher fluctuation for trRosetta model than I-TASSER and Robetta model which is presented in Figure 3 (e). For trRosetta and I-TASSER model, N-terminal regions (1-24) and 25-55 residues of TM regions fluctuation 

The secondary structure elements are identified for MD simulated models of SARS-CoV-2 M protein, as shown in 148-150, 155-158, 168-172, 176-180, 185-187, and 191-198, respectively . However, the dynamics of protein secondary structures have been observed for selected models over the simulation period as depicted in ( Figure  S8 ). In case of a-helix, tr-Rosetta has shown the highest average (42.34%) result as well as good stability compared with Robetta (40.86%) and I-TASSER (13.3%) models ( Figure  S8A ), which might give overall stability to protein tertiary structure as well as more likely to be functional (Jochim & Arora, 2009 ). The higher average (22.57%) b-sheets are observed for Robetta model while irregular fluctuation observed in tr-Rosetta model ( Figure S8b ) in the C-terminal domain (Figure 5a ). In case of turn and coil, I-TASSER model has exhibited a higher value compared to Robetta and tr-Rosetta models ( Figure S8c-d) .

PCA analysis is used to realize structural and energy changes in models of SARS-CoV-2 M protein during MD simulation. Bond angles, bond distance, dihedral angles, planarity, van der Walls and electrostatic energies are included as variables.

Here, PC1 and PC2 explain 99.9% of variance, where, PC1 exposes 83.3% and PC2 exposes 16.6% of variance. As shown in Figure 6 (a), the score plot of PC1 and PC2 has demonstrated that a major rightward shifting found in trRosetta model compared to Robetta and I-TASSER models along PC1. This clustering pattern indicates majority of the variables including planarity, dihedral, angel, bond distance, and vdW energies have a higher influenced on the variance along PC1 (Figure 6b ). On the other hand, Robetta and I-TASSER models are showing a similar pattern, its distribution clusters are at the farthest left, signifying the highest change in its coulomb energy profile.

Previously, it has been observed that M and S protein of SARS-CoV co-expressed and the first 134 amino acids of M protein are crucial for their interaction (Voss et al., 2009) . Hence, the corresponding interacting residues (1-135) in SARS-CoV-2 M are interpreted as N-terminal domain. Consequently, these 135 amino acids including three transmembrane domains are necessary to facilitate the accumulation of SARS-CoV M in the Golgi complex and to impose the recruitment of viral spike protein (S) to the sites of virus assembly and budding in the ERGIC (Satarker & Nampoothiri, 2020) . Comparing with SARS-CoV, the other structural proteins E and N of SARS-CoV-2 possibly interact with C-terminal domain (residues 100-222) of M protein (Fehr & Perlman, 2015; Schoeman & Fielding, 2019) . This C-terminal domain has been recognized as a functional domain in M protein, which remains in the cytosol. As well as, this C-terminal polar tail within the endodomain interacts with S protein, which proposes that the large M endodomain (ME) possibly plays crucial roles in SARS-CoV assembly (Luo et al., 2006) . Considering the location of Cterminal region, trRosetta model structure is more appropriate with its cytosolic domain while other models exhibit their unusual pattern. The highlighted domain regions of M protein which interacts with other structural proteins S, N, and E protein are shown in Figure 7 (a)-(c).

In this study, interaction patterns of the proteins are visualized through docking approaches employing PatchDock, FireDock, and Cluspro2.0. Docking scores among trRosetta model M protein and other full-length structural proteins (S, N, and E) from I-TASSER are provided in Table S1 . The expected best modes for protein complexes are pictured in Figure 8(a)-(c) .

Later, the close view of interacting residues among the protein complexes is displayed through PDBsum's interaction Moreover, in SARS-CoV, both E and N proteins are required to be co-expressed with M protein for the efficient assembly and release of VLPs. When these proteins are coexpressed, the native trimeric S glycoprotein is integrated into VLPs (Siu et al., 2008) . Thus, the structural proteins' role in VLP formation and infectivity is also predictable for SARS-CoV-2. Hence, we also aim to explore active site residues in trRosetta model M protein using CASTp web server. The prediction of active site residues will facilitate probable drug and peptide binding in the pocket. The residues and binding pocket are presented in Table S2 and Figure 10 (a) accordingly. Besides, we have presented some common interacting residues, Phe103, Arg107, Met109, Trp110, Arg131, and Glu135 in C-terminal region of SARS-CoV-2 M protein. These residues are involved in the interaction with S and N structural proteins (Figure 10b) .

Moreover, in recent times, it has been also observed that SARS-CoV-2 M protein and other structural proteins interact with accessory proteins (ORF3a, ORF6, ORF7a, ORF7b, ORF9b, and ORF10) as well as non-structural proteins (nsp2, nsp4, nsp5, nsp8, and nsp16). Therefore, these PPIs may play a significant role in viral structural protein processing, modification, and trafficking (Li et al., 2020) . M protein also suppresses type I interferon (IFN) association by hindering the development of efficient TRAF3-involving complex (Siu et al., 2014) .

Later, the 224 protein-protein interactions with M protein and 217 proteins have been retrieved from IntAct Molecular Interaction Database and interactions' network is visualized through Cytoscape (version 3.8.0) in Figure S9 . This network represents the abundant interaction of M protein with its intra viral and host interactome. These outcomes might expedite designing therapeutic strategies to disrupt the interaction among SARS-CoV-2 structural proteins as well as diverse interactome with other proteins.

The modeling of accurate and reliable membrane proteins has been a great challenge since most of these protein structures have exhibited low sequence identities in PDB database (Berman et al., 2002) . However, the computational modeling and prediction of three-dimensional structures of proteins holds the promise where experimental structures are not available (Schwede, 2013) . In our study, we have employed template-free modeling strategy to model probable 3 D structure of (full-length) M protein and compared the model structure with the template-based models.

The SARS-CoV-2 M protein sequence was retrieved from the NCBI database and the details of protein id for further analysis are provided in the result section. In SARS-CoV-2 M protein sequence, 20 mismatches and 1 gap have been observed. These mutations in M protein could probably play a key role in viral infectivity and host cell interaction. The primary structure has been investigated and various parameters have been calculated using ExPasy ProtParam tool. The results suggested that SARS-CoV-2 M protein is basic with isoelectric point (pI) of 9.51. The amino acid composition showed the maximum presence of Leu (15.8%) and minimum presence of Cys, Gln, and Met (1.8%). Since, the crystal or cryo-EM structure of SARS-CoV-2 M protein does not solve yet, we have retrieved 3 D model for M protein from I-TASSER, Robetta, and trRosetta servers. The accuracy and quality of the structures are validated by employing various servers including ERRAT, RAMPAGE, PROCHECK, ProSA-web, and QMEANBrane. The ERRAT, PROCHECK, and RAMPAGE server assure about good quality of models suggesting most of residues in favoured regions. Moreover, ProSA-web calculates overall quality score of protein structure comparing with experimentally (X-ray, NMR) determined protein chains in PDB database. QMEANBrane evaluates the local quality of alpha-helical transmembrane protein models. It applies precisely trained potentials in a transmembrane protein model for three different segments (membrane, interface, and soluble).

From the validation analysis, it has been revealed that trRosetta model is the best model than others according to its proper orientation in the membrane environment. We have also compared the model structures with SWISS-MODEL top two models from PDB ID: 5CTG (Bidirectional sugar transporter SWEET2b) and 6XDC (SARS-CoV-2 Protein 3a). The model (74-109 residues) based on PDB ID: 5CTG (14.29% sequence identities) has displayed better alignment with trRosetta compared with the model from PDB ID: 6XDC.

Then, 100 ns MD simulation has been performed on trRosetta, I-TASSER and Robetta models. trRosetta model shows significant changes in RMSD with an average value of 13.34 Å than Robetta model (3.98 Å) and I-TASSER model (7.68 Å). Monzon et al. reported that in multidomain proteins with higher RMSD is very common because hinge motions produce relative movements of domains as rigid bodies (Monzon et al., 2017) . However, RMSD is not a suitable measure for model quality assessment (Wallner & Elofsson, 2003) rather a comparatively good protein model with one bad region might render a very high RMSD (Moult et al., 2005) . The Rg curves for trRosetta have showed to be much higher than Robetta and I-TASSER models. It is important to mention that all models are in a stable pattern which indicates a stable protein folding. The maximum Rg values designate loose packing of the protein conformation, which means structure that is more flexible (Dash et al., 2019) . In case of SASA profile, trRosetta model shows a distinct pattern with a higher value of SASA over time, whereas Robetta and I-TASSER model presented lower SASA values. The decrease value of SASA indicates the shrunken nature of protein (Dash et al., 2019; Kamaraj & Purohit, 2013) . We also observed a notable difference in H-bond pattern during the simulation period, whereas trRosetta model participates with a greater number of H-bonds, while I-TASSER and Robetta models showed lower participation in H-bonds interaction. Pace et al. reported that contribution of H-bonds to protein stability is strong (Pace et al., 2014) . As can be seen in Figure 3 (e), the plots of RMSF for trRosetta model show higher fluctuation with average RMSF value of 4.33 Å when compare to I-TASSER (2.12 Å) and Robetta (1.16 Å) models. We can notice in Figure 5 (a) for trRosetta model, the C-terminal domain (104-222) contains most of the loop regions. However, the high fluctuation occurred in the loop regions (residue numbers; 146-149,160, 164-176,180-195, and 201-209) in the Cterminal domain for trRosetta. This is not unexpected because loop regions have lacked any definite geometry (Chowdhury et al., 2020) . According to the RMSD, Rg, SASA, and RMSF plots results, the H-bond results of trRosetta, I-TASSER, and Robetta models have depicted that trRosetta conformation becomes more flexible and stable than I-TASSER and Robetta. This consequence is further supported by PCA analysis.

Further, the interacting residues and interactors for M protein have been explored. It has been interpreted that the first 135 amino acids are crucial for M and S protein interactions. This region has been adequate to mediate the accumulation of M in the Golgi complex. Thus, imposing the recruitment of the viral S protein to the directions of virus assembly and budding in the ERGIC (Voss et al., 2009) . Besides, the C-terminal region is the functional domain for interacting with E and N structural proteins. Moreover, M-N interaction stabilizes the nucleocapsid (N protein-RNA complex), and also the internal core of virions, eventually, promotes completion of viral assembly (Escors et al., 2001; Fehr & Perlman, 2015) . This indicates C-terminal domain of the M protein structure must be cytosolic. The molecular docking study for M protein with other structural proteins (S, E, and N) supports our interpretation from previous studies. Moreover, the common interacting residues Phe103, Arg107, Met109, Trp110, Arg131, and Glu135 of M protein in C-terminal domain are identified and visualized for interaction with S and N. The active site prediction by CASTp has been supposed to be crucial for targeting the SARS-CoV-2 M protein. We have also visualized the network of interactions for M protein (UniProtKB-P0DTC5) with other cellular proteins using Cytoscape (version 3.8.0). The interactions of ORF3a and M protein have also displayed the structural functions for SARS-CoV virus (Huang et al., 2006) , finding relevancy with SARS-CoV-2 in our study. Consequently, the insights into atomic detail of the three-dimensional structures for proteins are crucial for a better understanding of biological processes. Only the accurate structures can be intensely used to sort out biological queries (Pra znikar et al., 2019) .

In this study, considering structural pattern for TM region and cytosolic C-terminal region, trRosetta (model 5; TM score: 0.64; TM region RMSD: 2 Å) has provided relatively better model than Robetta (model 4; TM score: 0.61; TM region RMSD: 3.3 Å) and I-TASSER (TM score: 0.45; TM region RMSD: 6.5 Å). Recently, a research group reported that trRosetta and AlphaFold M protein models have displayed almost similar patterns of structures. Contrary to these models, I-TASSER model from Zhang group has manifested poor local geometries, poor side-chain conformations, bad backbone dihedral angles, and numerous atomic clashes generally suggested poor stereochemistry (Heo & Feig, 2020) .

Furthermore, the models with good quality, particularly in the TM region which can be produced even in the targettemplate sequence identity 20 À 40% region (Nikolaev et al., 2018) . Another study has obtained more accurate alignments for proteins with low sequence identities to their templates, can be achieved using structure-based profile alignment methods. This one has correlations with our study in which, modeling of structure is at least acceptable to membrane proteins where models exhibit RMSD-Ca values to the native of 2 Å or less in the transmembrane regions (Forrest et al., 2006) . However, the common error sources are alignment errors, backbone distortions, misplaced side chains, or picking a template of incorrect fold with low sequence identity and high structural divergent model (Al-Khayyat & Al-Dabbagh, 2016) . Melo et al. has mentioned that typical errors in the model are either in or close to regions that join secondary structure central parts and are of high energy (Melo et al., 1997) . Conversely, the models with a slightly higher (worse) RMSD but nearly correct overall fold may be used for prediction of function from their global fold, (Kihara & Skolnick, 2004) categorization of local functional sites (Weidong Tian et al., 2004; Li et al., 2008) , or analyzing low-resolution structure (Shin et al., 2017) .

The membrane protein of SARS-CoV-2 is one of the vital proteins, advances in the 3 D structure determination might speed up the drug discovery process. Now, computational prediction of the protein structure can play a central role in its structural elucidation (Muhammed & Aki-Yalcin, 2019) . We have explored reliable and extensively employed computational methods to explore and evaluate the probable M protein structure for SARS-CoV-2 for further application.

This study elucidates the structural and dynamic features of SARS-CoV-2 M protein. To explore the biological consequences, in-depth realization of structural phenomenon is indispensable. In this study, we have employed in silico approaches for modeling of M protein. The models are extensively evaluated through ERRAT, RAMPAGE, PROCHECK, ProSA-web, and QMEANBrane servers. The best models from Robetta and trRosetta are further considered for MD simulation analysis comparing with I-TASSER server model. Our results disclose that M protein model generated from trRosetta is comparatively better than the models generated from Robetta and I-TASSER servers. Moreover, the utility of trRosetta model structure is interpreted through visualization of interacting residues during protein-protein interactions. This study provides details structural and dynamics insights of SARS-CoV-2 M protein which may help designing potent and selective inhibitors targeting the membrane protein.

Virtual screening, molecular dynamics, density functional theory and quantitative structure activity relationship studies to design peroxisome proliferator-activated receptor-c agonists as anti-diabetic drugs

Investigating the binding anity, interaction, and structure-activity-relationship of 76 prescription antiviral drugs targeting RdRp and Mpro of SARS-CoV-2

In silico prediction and docking of tertiary structure of LuxI, an inducer synthase of Vibrio fischeri

The IntAct molecular interaction database in 2010

Mutation spectrum in TPO gene of Bangladeshi patients with thyroid dyshormonogenesis and analysis of the effects of different mutations on the structural features

Research papers: The Protein Data Bank

Antiviral peptides as promising therapeutics against SARS-CoV-2

Integration of biological networks and gene expression data using cytoscape

Verification of protein structures: Patterns of nonbonded atomic interactions

Macromolecular modeling with rosetta

Structural and dynamic characterizations highlight the deleterious role of SULT1A1 R213H polymorphism in substrate binding

Computational analysis and binding site identification of type III secretion system ATPase from Pseudomonas aeruginosa

Multivariate calibration

A glance into the evolution of template-free protein structure prediction methodologies

Lipid14: The Amber lipid force field

In silico identification of a key residue for substrate recognition of the riboflavin membrane transporter RFVT3

The membrane M protein carboxy terminus binds to transmissible gastroenteritis coronavirus core and contributes to core stability

Coronaviruses: An overview of their replication and pathogenesis

On the accuracy of homology modeling and sequence alignment methods applied to membrane proteins

The proteomics protocols handbook

Protein bioinformatics from sequence to function

Modeling of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) proteins by machine learning and physics-based refinement

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13

Severe acute respiratory syndrome coronavirus 3a protein is released in membranous structures from 3a protein-expressing cells and infected cells

Prediction of deleterious non-synonymous SNPs of human STK11 gene by combining algorithms, molecular docking, and molecular dynamics simulation

A molecular modeling approach to identify effective antiviral phytochemicals against the main protease of SARS-CoV-2

Assessment of helical interfaces in protein-protein interactions

Metal based donepezil analogues designed to inhibit human acetylcholinesterase for Alzheimer's disease

In silico screening and molecular dynamics simulation of disease-associated nsSNP in TYRP1 gene and its structural consequences in OCA3

Protein Modeling and Structural Prediction

Multiple receptor conformers based molecular docking study of fluorine enhanced ethionamide with mycobacterium enoyl ACP reductase (InhA)

General overview on structure prediction of twilight-zone proteins

Microbial genomes have over 72% structure assignment by the threading algorithm PROSPECTOR_Q

Protein structure prediction and analysis using the Robetta server

The ClusPro web server for protein-protein docking

Making optimal use of empirical energy functions: Force-field parameterization in crystal space

YASARA: A Tool to Obtain Structural Guidance in Biocatalytic Investigations

PDBsum: Structural summaries of PDB entries

PROCHECK: A program to check the stereochemical quality of protein structures

Ab initio protein structure prediction

Mechanisms of domain closure in proteins

Characterization of local geometry of protein surfaces with the visibility criterion

Virus-host interactome and proteomic survey of PBMCs from COVID-19 patients reveal potential virulence factors influencing SARS-CoV-2 pathogenesis

Structure validation by C alpha geometry: Phi, psi and C beta deviation

Severe acute respiratory syndrome coronavirus membrane protein interacts with nucleocapsid protein mostly through their carboxyl termini by electrostatic attraction

FiberDock: A web server for flexible induced-fit backbone refinement in molecular docking

ANOLEA: A www server to assess protein structures

Conformational diversity analysis reveals three functional mechanisms in proteins

Critical assessment of methods of protein structure prediction (CASP) -Round 6

Homology modeling in drug discovery: Overview, current applications, and future perspectives

Membrane protein of human coronavirus NL63 is responsible for interaction with the adhesion receptor

A structural analysis of M protein in coronavirus assembly and morphology

A comparative study of modern homology modeling algorithms for rhodopsin structure prediction

Contribution of hydrogen bonds to protein stability

R programming for data science

Validation and quality assessment of macromolecular structures using complex network analysis

NCBI reference sequences (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins

RStudio: Integrated development for R. RStudio

Structural Proteins in Severe Acute Respiratory Syndrome Coronavirus-2

PatchDock and SymmDock: Servers for rigid and symmetric docking

Coronavirus envelope protein: Current knowledge

Protein modeling: What happened to the "protein structure gap

A computational approach to explore and identify potential herbal inhibitors for the p21-activated kinase 1 (PAK1)

Prediction of local quality of protein structure models considering spatial neighbors in graphical models

Assessing the local structural quality of transmembrane protein models using statistical potentials (QMEANBrane)

Suppression of innate antiviral response by severe acute respiratory syndrome coronavirus M protein is mediated through the first transmembrane domain

The M, E, and N structural proteins of the severe acute respiratory syndrome coronavirus are required for efficient assembly, trafficking, and release of virus-like particles

EFICAz: A comprehensive approach for accurate genome-scale enzyme function inference

CASTp 3.0: Computed atlas of surface topography of proteins

Studies on membrane topology, N-glycosylation and functionality of SARS-CoV membrane protein

Can correct protein models be identified?

SWISS-MODEL: Homology modelling of protein structures and complexes

Ggplot2: Elegant graphics for data analysis

ProSA-web: Interactive web service for the recognition of errors in three-dimensional structures of proteins

Principal component analysis

Improved protein structure prediction using predicted interresidue orientations

The I-TASSER suite: Protein structure and function prediction

TM-align: A protein structure alignment algorithm based on the TM-score

In silico Characterization and Homology Modeling of Histamine Receptors

We are grateful to our donors (http://grc-bd.org/donate/) who supported to build a computational platform. The authors like to acknowledge The World Academy of Science (TWAS) to purchase the High-Performance Computers for molecular dynamics simulation. We also like to give special thanks to Mst. Noorjahan Begum 

The authors declare no competing financial interest.

Mohammad A. Halim http://orcid.org/0000-0002-1698-7044