key: cord-0685585-sxjx2wuu
authors: Jeyasudha, J.; Usha, G.
title: An Intelligent Centrality Measures for Influential Node Detection in COVID-19 Environment
date: 2021-05-13
journal: Wirel Pers Commun
DOI: 10.1007/s11277-021-08577-y
sha: 890db389d4f85e06a5733d7d07dbef9908ce1e82
doc_id: 685585
cord_uid: sxjx2wuu

With an advent of social networks, spamming has posted the most important serious issues among the users. These are termed as influential users who spread the spam messages in the community which has created the social and psychological impact on the users. Hence the identification of such influential nodes has become the most important research challenge. The paper proposes with a method to (1) detect a community using community algorithms with the Laplacian Transition Matrix that is the popular hashtag (2) to find the Influential nodes or users in the Community using Intelligent centrality measure’s (3) The implementation of machine learning algorithm to classify the intensity of users.The extensive experimentations has been carried out using the COVID-19 datasets with the different machine learning algorithms. The methodologies SVM and PCA provide the accuracy of 98.6 than the linear regression for using the new centrality measures and the other scores like NMI, RMS, are found for the methods. As a result finding out the Influential nodes will help us find the Spammy and genuine accounts easily.

Twitter is both a networking channel and a way to create community, in addition to being a micro blogging platform for individuals. Imagine a tool that lets you send micro-messages at any time of the day or night to current and potential customers, with little or no objection to why and when you send them. Twitter has a feature called Promoted Goods that promotes strategically your Twitter profile or unique messages that are compatible with other Twitter users. Twitter also provides you the ability to add your voice to other on-site discussions, maybe by addressing a query that no one else knows the answer to, thus establishing you and your business as experts in your field.

COVID-19's impacts on Twitter have already surpassed every case humans have witnessed, and the endemic progresses, humans will see more stress on our operation. In addition to Twitter, the disease COVID-19 has a far-reaching effect on the supply chain partners too. While they would normally have months of lead opportunity to update hardware capacity for growth rate, in this case the supply chain has been compromised by production issues in China, order to avoid delays in deliveries to our data centres. The twitter Data Center, SiteOps, Supply Chain, Hardware Engineering and Mission Critical departments help to handle the service's physical infrastructure, Innovative skills to create new potential in existing supply.

They also deleted more than 1100 tweets containing deceptive and potentially dangerous information from Twitter by implementing those policies on March 18. In addition, people's automated systems have challenged more than 1.5 million accounts that focused on spammy or manipulative behavior discussions around COVID-19. They will continue to use both technology and human teams to help us recognize and avoid spammy activity and accounts as seen in Fig. 1 .

Twitter will require people to remove tweets that are included as described in Fig. 2 . Relevant and unverified statements made by individuals who impersonate a administration or healthiness official or agency with a fake account by an Italian health official claiming that the quarantine of the country is over. Disseminating inaccurate or misleading information about diagnostic criteria or procedures such as in COVID-19 "if you can hold your breath for 10 s, you do not have corona virus.". False or deceptive statements on how to differentiate between COVID-19 and a particular disease and whether that information tries to diagnose someone correctly, such as "if you have a wet cough, it's not coronavirus-but a dry cough is" or "you'll feel like you're drowning in snot if you have corona virus-it's not a normal runny nose. Claims that particular classes, nationalities are more vulnerable to COVID-19. Continue to review the Twitter rules in the context of COVID-19 and consider ways in which new behaviors may need to be taken into account and influential behaviors lead to find out who is ruling the Covid-19 situation and who are spammy and genuine comes into the picture. (Fig. 3) .

Topic modelling on twitter data to find what people is tweeting about in relation to covid19 tweets. From an example dataset we will clean the content information and investigate what well-known hash tags are being utilized, who is being tweeted at and retweeted, lastly we will utilize two solo AI calculations, explicitly idle dirichlet portion (LDA) and non-negative network factorization (NMF), to investigate the subjects of the tweets in full.

Twitter is an incredible wellspring of information for a social researcher, with more than 8000 tweets sent for each second. The tweets that a huge number of clients send can be downloaded and broke down to attempt to explore mass feeling on specific issues. This can be as fundamental as searching for watchwords and expressions like 'marmite is terrible' or 'marmite is acceptable' or can be further developed, intending to find general subjects (not simply marmite related ones) contained in a dataset. Next let's discover who is being tweeting and no more, retweeted the most, and what are the most widely recognized hashtags. We are going to extricate the following from each tweet: a. who is being retweeted (assuming any) b. who is being tweeted at/referenced (assuming any) c.what hash tags are being utilized (assuming any).

Graph community detection and data clustering are a part of Machine Learning.GCD methods are specialized in clustering (or clique detection) in graphs, which are a special representation of data (objects are nodes (vertices), edges (links) of the graph represent some kind of quantitative or qualitative relation between objects/nodes). Data clustering in general can be applied to data that has nothing to have with graphs. For instance, if Fig. 3 Proposed architecture to find the influential using centrality measures you have data consisting of a set of persons, each one labeled with weight and height, and you want to cluster them e.g. to deliver a t-shirt with the correct size, your data is not a graph, that is, there is not a property which relates a priori each of the persons with any other. GCD algorithms almost certainly will do better than general methods when the problem is finding cliques/clusters in data represented with graphs. It's a graph, nodes are users, and edges between them are friendship ties.

The community detection methods can be categorized into 4 categories, Node-Centric Community where each node in a group fulfills such constraints such as cliques, and k-clicks, k-clubs. The second method is Group-Centric Community which considers the associations inside a gathering in general like quasi-clique. The gathering needs to fulfill certain properties without zooming into node-level. The third method is Network-Centric Community, in which it partition's the entire system into a few non joining sets like the grouping of clusters on node likeness, space models of latent, models with block, spectral clustering, maximization of modularity. The last method is Hierarchy-Centric Community, it build a various leveled structure of networks like Divisive clustering, Agglomerative clustering. Different issues, for example arithmetic, software engineering, sociology, and so forth. Management science, manage measuring centrality in complex systems. Various measures have been proposed, including [1] [2] [3] [4] , alpha centrality for asymmetrical networks [5] , information centrality [6] , and so forth. By and large, the techniques alluded to above just describe restricted pieces of what it implies for an on-screen character to be "crucial" to the system. As confirmed by [7] , centrality measures, or presumably notable understandings of these measures, make certain presumptions concerning how traffic moves through the system. Different methodologies, for example, stream between, do not accept the briefest ways, and however expect the right ways wherein no hub is visited more than once. Google's Page Rank algorithm [8] depends the supposition that the likelihood of people riding heterogeneous sites is equivalent, which doesn't relate to the real world.

It is in this way evident to make the accompanying inferences that the centrality measures after that planned with the sorts of developments that are proper, suggesting that a particular centrality is perfect for one application; however that it is routinely flawed for an elective application. Accepting that numerous hubs are excluded from the briefest way of other hub sets, subsequently, the estimation of the centrality between hubs will be zero. Since Katz centrality [2] takes all ways between sets of hubs during the time spent in impact, its more performance multifaceted nature makes it difficult to apply to enormous scope systems. In addition, what isn't frequently perceived by the previously mentioned neighborhood-based and way based centrality measures is that basic multifaceted nature and vulnerability assume a noteworthy job in the investigation of system centrality. In addition, what isn't regularly perceived by the previously mentioned neighborhood-based and way based centrality measures is that basic multifaceted nature and vulnerability assume a job.

The estimation of the graph measures of entropy can be acquired by utilizing different chart values, for example, the quantity of vertexes [9] , the vertex degree sequence [10] and the all-encompassing degree arrangements (for example second neighbor, third neighbor, etc.) [11] . Author in [12] proposed that the structure of a given system could be treated because of a self-assertive capacity. Enlivened by this novel knowledge, for a given system, Shannon's data entropy is utilized to gauge its basic data substance and measure its vulnerability. From that point forward, diagram entropy dependent on Shannon's hypothesis has assumed a key job in the examination of informal communities. Be that as it may, generally little work [16 and 17] has been done to show the viability of Shannon's hypothesis of system centrality count.

The association of the article is as follows as in related works section discusses about the related works about finding communities and evaluating the metrics for the communities detected and about Centrality Measures. The third section provides an overview of proposed methodology for finding the new features and applying with data collected from the twitter for a normal topic, whatsapp data and Covid19 Twitter Dataset,and compared the existing features of LFR benchmark dataset.In the fourth section, discusses about result analysis to enumerate the influence or impact of nodes on networks and the experiments are conducted on the basis of real world datasets of varying social media no of nodes and the social edges measures to evaluate the effectiveness of the proposed methodology compared with different models, with different datasets whatsapp and twitter, covid19 dataset with LFR bench marks. The last section discusses about conclusion and future work of this proposed work.

Discovering communities inside a discretionary system can be a troublesome errand to ascertain. The quantity of networks inside the system, assuming any, is normally obscure and the networks are regularly of lopsided size as well as thickness. Despite these problems, be that as it may, a few strategies for finding a network community have been created and utilized with changing degrees of accomplishment. As on account of technique of Minimum-Cut, system will be separated as foreordained parts, normally around a similar size, picked so that the quantity of edges between bunches is limited. And the method Hierarchical clustering a similarity measure is defined which quantifies some types of similarity (usually topological) flanked by node pair and many community measures that help to find the community similarity of cosine, index using jaccard, and distance of hamming between adjacency matrix records. According to this measure similar nodes are grouped into communities. In the [13] and [9] Girvan-Newman algorithm, the network edges are between communities will be identified and removed, leaving only the communities behind. Discovery is carried out by using a graph theoretic measure of the centrality between the two edges, and assigned a value to each edge that is greater if the edge is present.

The technique Modularity maximization identifies networks via looking over potential divisions of a system for at least one that have especially high particularity. In Statistical deduction the generative model is fitted into the system information and formulates the network structure. The Clique based strategies as a hub can be an individual from more than one coterie, a hub can be an individual from more than one network in these techniques giving a covering network structure. To evaluate the community the following can be used Goodness scores Separability, Density, Cohesiveness, Cluster Coefficient. Edge/ Link Prediction, Modularity, F1 Score, Omega Index, And Normalized Mutual Information. Internal Density, Average degree, Triangle Participation Ratio (TPR), Fraction over median degree (FOMD).,Expansion (External Degree), Cut Ratio (the External Density), Conductance, Maximum Out Degree Fraction (Maximum ODF),Normalized Cut,Average ODF, Edges Inside.

In this section, the popular methods presented to identify Influential nodes in different network topologies, with a classical centrality measures and many other approaches. In [1] mentioned the importance of a node could be determined by three different structural attributes degree, between's and closeness. In [2] presented a proportion of centrality known as Katz centrality which registered impact by mulling over the quantity of strolls between a couples of hubs. As substantiated by Katz, the lessening factor α can be deciphered as the opportunity that an edge is viably crossed. Also, the parameter α shows the overall importance of endogenous versus exogenous factors in the assurance of centrality. Eigenvector centrality originally proposed by [4, 14] has ended up being one of symmetric system centrality's standard measures and can recognize the centrality intensity of a node in the light of the possibility that relationship with high-scoring hubs contribute more to the hub's score being alluded to than ascend to relationship with low-scoring hubs.

To manage the state of uneven system, [15] assumed that the lopsided networks eigenvectors values were not symmetrical, so the conditions were somewhat extraordinary and conceptualized alpha centrality approach. Google's Page Rank [16] , among others, is an instance of alpha centrality. Authors in [17] characterized the data measure of centrality utilizing data controlled in every single imaginable way between sets of focuses. Authors [8] presented the centrality of sub graph which was reachable numerically from the spectrum of the system's proximity grid described the association of every entertainer in all sub graphs in the given system. An epic methodology of centrality estimating dependent on game hypothetical ideas is available in [18] delineated expanded the traditional betweenness centrality measure right when it is shown as double arrange stream streamlining problem. In [9] broadened the ordinary origination of centrality of betweenness which certainly expected that data multiply uniquely in limited ways and proposed a betweenness measure that relaxed this mistrust, including commitments from the nodes; it is no simply the briefest. As of late [10] right off the bat presented TOPIS as a mew proportion of centrality. [11] Improved the first evidential centrality by taking hub degree disseminations and worldwide structure data into thought. A tale proportion of hub impact dependent on far reaching utilization of the degree technique, H-list and measurements was proposed in [12, 19] .

Authors [20] provided a half-neighborhood strategy of the measures. Additionally, Chen et al.presented an alleged Cluster Rank technique, which considers the fellow citizen nodes impact and bunching coefficient into thought. [21] Enhanced k-shell technique with reevaluating the huge associations among hubs and evacuated hubs and proposed a blended degree disintegration strategy. In [22] affirmed that compelling nodes are arranged in such a way k-center across different social stages. The Leader Rank calculation was adjusted by presenting a variation dependent on allotting degreesubordinate loads onto affiliations developed by ground nodes [23] . Authors [24] found the best values by utilizing network based hypothesis. Thus, the system is isolated into various autonomous sets with various hues. In [25] contemplated the human conduct and inferred that people who assume a critical job in interfacing different networks is required to be a powerful spreader of impact. Authors in [26] led a close valuation in the nearness grid and clarified measure from the point of view of connection consistency. Authors [27] prescribed the insignificant arrangement of powerful nodes can be astutely mapped onto ideal permeation in systems. In [10] characterized a few part of the basic data substance chart with analysis of its scientific properties. From that point forward, entropy measures are used to research systems' basic multifaceted nature and have a fundamental impact in assortments of use fields, including science, science and human science. Everett introduced another idea of job similitude produced from auxiliary equality and presented another proportion of basic multifaceted nature dependent on the entropy measure created in [28] .

To gain a nonstop quantitative proportion of robot group diversity, Authors [29] outlined how to portray a system troupe's Shannon entropy and how it was associated with the Gibbs and von Neumann entropies of system gatherings. Authors [12] gave an increasingly broad diagram on strategies for estimating the entropy of charts and showed the wide materialness of entropy measures. Authors [30] tended to the issue of how to recognize powerful hubs in complex systems by utilizing relative entropy and the TOPSIS strategy, which joins the benefits of existing centrality quantifies and exhibited the viability of the proposed technique dependent on trial outcomes. The complex timeseries data is not used. Only the data that is downloaded from the twitter and whatsapp at the duration only is taken not the future and earlier data is considered and the powerful hubs in complex systems is not needed for the proposed method. Authors [21] portrayed the highlights of versatile informal organizations and introduced an assessment model to measure impact by dissecting and computing the companion entropy and correspondence recurrence entropy involving clients to delineate the vulnerability and intricacy of societal impact.

The Proposed Methodology within Fig. 4 , start's with the data collection from social media websites like twitter using twint tool [31] , what app data using bluefire and LFR benchmark dataset's After the data collection phase, the data are preprocessed and sentimental analysis is done to find the sentimental scores.The sentimental scores and the dataset is provided to a community detection method using Laplacian matrix for nodes is defined in Fig. 4 a Collection of tweets using twint tool. b Twitter datasets with segments. c Collection of whats app data the next phase to detect the community. The sentimental analysis also adds a new column as segment which categories the data as positive and negative and then various centrality measures are taken into account and a new centrality measure user centrality and time series centrality is defined along with the Influence impact centrality. The detection of the existing centrality features is done by the default packages available in network. The basis of the graph formation and nodes with edges is done as a preparatory step.

An new centrality measure is done by the analytics method so that it can be used for dominant node detection. The prepared datasets are divided as the training and test dataset and feed into the ML algorithms like SVM, PCA.etc. And the testing data is feed into the algorithms to predict the dominant and non dominant nodes.

Twitter analysis is the way toward perceiving the positively and negatively informed Person from the specific network and afterward contrasting the individual with various gatherings with examination the individual idea. We did nostalgic analysis to discover the constructive and pessimistic impact of the individual. The information is gathered with the assistance of the Twint device.

The gathering of data is done with the help of twint tool in Fig. 4a . In this the social network of machine learning is taken and among that a community of user was taken example Machinelearnfx. Their followers, tweets(recent and old), following, their location, screen name, verified user or not are gathered with the help of twint and many online tools like followeranalysis.com. Compare the Negative and the Positive percentage which is higher than it will make the Segment. Convert the date and time to the timestamp values. Remove the unwanted rows (Especially null, replicated data) from the datasets, the nodes and edges are considered based upon the followers and following count. Make the numbers for every user as displayed in Fig. 4b. 

WhatsApp analysis is the process of recognizing the positively and Negatively messaged Person from the Groups and then comparing the person with different groups to analysis the person thought. We had done this analysis with two groups chats from the real time data. For getting the chat details make sure WhatsApp (in Fig. 4c) is installed in your mobile. App → WhatsApp→Select the Group →FAQ button →More →Export Chat or using blue fire we can automate the code and download it as represented in the Fig. 4c . In whatsapp data the community is already been formed with help of the popular topics that is been held in the twitter analysis and based upon that a group has been formed and the members are added and the data is retrieved.

From that we can export the chats using the steps in the mode of.txt file. Making the NLTK algorithm trained with spam dataset and the accuracy is 81.333%. The dataset we taken is 2000 data's in that 1850 are Training and the other are testing data. This is used to analysis the messages from the group members.

Importing the chats and then converting the chats to the.csv file. And categorize based on the positive and negative segments as in the Fig. 5 . Split the data with the name Date, Time, Users, and Message as in Fig. 6 .

Create a new column with the name "Segment" as per Fig. 5 . Then make the message to analysis with the NLTK algorithm. Whenever we passing the data it will segregate the words and make check with the NLTK dataset. The percentage of the message will get calculated. Compare the Negative and the Positive percentage which is higher than it will make the Segment. Convert the date and time to the timestamp values. Remove the unwanted rows from the datasets. Make the numbers for every user as displayed in Fig. 7. 

Lancichinetti-Fortunato-Radicchi benchmark is a calculation that creates benchmark systems (fake systems that take after genuine systems). They have from the earlier known networks and are utilized to think about various network discovery methods. [14] The benefit of the benchmark dataset over different techniques it represents the varieties in the circulations of network degree and sizes [32] . In particular, this program is designed to create binary networks with overlapping nodes. The program must generate three files: (1) network.dat contains a edge list (nodes starts from 1 to the number of nodes, the edges are arranged, made to repeat twice, i.e. source-target and target-source). (2) Community.dat includes a list of nodes and their membership (memberships are classified as integer numbers > = 1). (3) statistics.dat contains the degree distribution (in logarithmic bins), the population size distribution, and the mixing parameter distribution. In this the community.dat is taken for the comparison with the real world datasets.

The number of tweets about Covid-19 and Corona virus is 628 M tweets up until now. Perhaps the saddest thing we have ever needed to do is to dissect the advancement of the new Corona virus Covid 19 on Twitter. We will refresh the information consistently and we will give as much data as possible. On the off chance that you are a columnist you can utilize this unreservedly by referencing Tweet Binder as a source. On the off chance that you need the crude information (the tweets), if you don't mind get in touch with us since that can't be offered for nothing. Coronavirus on Twitter is likewise brimming with counterfeit news; if it's not too much trouble remain sharp. Since many individuals are tweeting about Coronavirus, we have run a hunt with all the tweets that got 1000 RTs at any rate. We consider those "significant", that doesn't mean clearly that they are the most significant tweets of the Coronavirus issue. There are in excess of 40,000 tweets about Covid19 who got in any event 1000 RTs in a tweet. Rather than concentrating on the larger network and fining the influential nodes, if the nodes with the popular tags, gains more insight with the nodes that prefers to stay in the social network and use the contents used by the nodes.

The Data collected using the twint tool based upon the hashtag covid19 and the users who have tweeted based on the covid19. The following features where extracted from the twint tool.

• Parent_id that describes the twitter handle • And the link_id who are linked to the twitter handle • Id-The Twitter name of the twitter handle • Total tweets -consists of retweet,links and text tweet • Followers

The community detection using the random walk is done with following parameters S is the adjacency matrix and X is the attribute matrix, k represents the number of clusters (default is 1) and n will be the quantity of walks. And the H can be taken as Laplacian transition matrix Lt = D-1A.

The algorithm to find the community is the simple random walk method but finding the matrix for the nodes will be different. Since U is the cluster attribute matrix and V is the attribute factor matrix, we use a new matrix that is Laplacian matrix for the better findings of communities as referred in the Fig. 8 .

Input: S, X, k,t,n Output: clustering result C 1: Preprocess: S, X: Initialize: U,V, L: while t' < t do: # alternatively update parameters. U(t' + 1). U(t' + 1) ← call the function update with (U(t')) parameter. V (t' + 1) ← call the function update with (V (t')) parameter. L(t' + 1) ← call the function update with (L(t')) parameter. end Loop. while n' < n do. # assign each vertex to the clusters. cn ' ← argmaxl{un',l | l = (1,..., k)}. end while. In text databases, normally the data is collected is defined by the term D matrix, with m as number of documents and n number of terms for the ratio m/n)and here it is the Laplacian matrix, number of clusters will be defined by the formula mn/t where t is the number of non-zero entries in D. Note that in D each row and each column must contain at least one non-zero element.Here the m refer's the nodes and the n will be the edges and the t value mostly will be the nodes with the edges. So when the value has to be determined, it will mostly nearer to 1 or less. So the max value is taken into consideration. The other values are not taken for evaluation because it will not give expected outcome.

Evaluate the community by the NMI score. Normalized Mutual Information (NMI) defined as a metric to evaluate the results between zero and one i.e between no mutual information perfect correlations. It is made normalized by the values of some generalized mean of true labels and prediction labels of H, and counted by the average method. Perfect labeling are both same and prefect, the final score is 1.0. The clusters are completely spitted upon class's members as referred in the Fig. 9 . The adjacency matrix aims at the community detection in a very larger circle, so the constraints to be noted and nodes, edges are very vast. When the Laplacian matrix is used the detection of the community is been reduced to the most important nodes and the edges which ultimately gives the good result. By picking the "User" and "Date Time" we are making a network graph for these nodes of the two groups. LFR Benchmark has its own parameter Number of Node = 233 ~ 31,948 (250), Avg. Degree = 5, tau 1 as 3, tau 2 as 1.5 and mu as 0.1 and plotted the nodes using NetworkX Graph as in Fig. 10. 

The Analysis is made with the two modes of Social Media, WhatsApp and Twitter. In WhatsApp Analysis, is been made of two parts Sentiment and Network Analysis. The Segmenting part is used to analysis the person who is messaging positively and negatively. The dataset that is taken is shown below with the user activities in Fig. 11a, b shows the user activities of the twitter dataset. At first after exporting the chat we need to convert the.txt into the.csv format with the header file like Date, Time, User, Messages. Analysis the messages of with the counts and plotting them into the graph.

In this the count shows that how much negative messages presented in this data, unique shows that how many different user are messaging negatively. Then top shows the person who The Centrality measures taken for analysis are, Fig. 11 a Users activities for the whats app dataset. b Users activities for the twitter dataset Current flow betweenness centrality utilizes an electrical flow model for data distribution and Communicability betweenness centrality utilizes the quantity of walks.

The new centrality measures are.

1. Influence impact centrality Impact score is calculated by the following formula No of followers of the twitter handle n f divided by no of total tweets n tt . Then the centrality is found by finding out the impact score which is of 95 percentile and above. Based upon the Kendall's tau value and the influence nodes constraints the percentile is been fixed as 95. It is more specific while ranking the influential nodes.

Then the centrality is found by finding out the followers which is of 95 percentile and above.

Then the centrality is found by finding out the total tweets which is of 95 percentile and above.

Centrality is been created with the user handles as in the Fig. 12 . Fig. 12 .

The Influential nodes help the social media; it affects the normal users and the whole world politics. It is affecting the medicine, health, sciences, agriculture and education. It decides the important topics to be discussed in the public and it affects indirectly the normal people. The Influence impact score determines the influential nodes, it is based on the followers and following count.

Network analysis of the WhatsApp data, is this User Centrality and the time Series centrality are the Features to create the NetworkX Graph. Change all the data into integers to develop the own node. The data wants to get converted from the.csv format to.txt format. By using the users and the time stamp we are finding the node for the network to the given Table 1 , Group from whatsapp as in Table 2 and LFT datasets in Table 3 . Group 1 from twitter. Group 2 from whatsapp. Group3 LFR benchmark graph. Group 4 Sentimental Analysis To Find The Popular Tag For Covid Datasets. So far we have removed who was retweeted, who was referenced and the hash tags into their own different sections. Presently lets take a gander at these further. We need to realize who is exceptionally retweeted, who is profoundly referenced and what famous hashtags are going round. In the accompanying segment we will play out an examination on the hashtags as it were. We will surrender it over to you to return and rehash a comparative examination on the referenced and retweeted segments. First we will choose the segment of hashtags from the dataframe, and take just the lines where there really is a hashtag. The usage counts help to increase the probability of the positive and negative tweets; other parameters like retweet and like don't have much impact on the probability. So the Usage count is taken into consideration.

The Fig. 13a ,b confirms that there are more correlation between the tag Covid19 and coronovirus and the centrality measures comparison is done with new centrality measure influence imapact,followers and tweet centrality measures shown in Fig. 14 . The probability of each and every tweet is taken into consideration for sentimental analysis. The data is preprocessed and the prediction probability for tweet about the covid was done by 

The SVM (Support Vector Machine) [33, 34] is a defined as the Artificial Intelligent calculation that can be utilized for both grouping and relapse difficulties. In the SVM calculation, every datum thing is plotted with a point in the space of n-dimensional estimation of every element to the estimation of a specific organize. PCA (Principle Component Analysis) is the given 3D space; a "best fitting" baseline limits the good ways for finding the nodes. The best-fitting line can be compared with the first for the prediction of the social nodes. The procedure is followed again and again to find the premise vectors (called as head parts) but the separate information's are uncorrelated.

Linear regression [35] is utilized for finding direct connection among target and at least one indicator. There are two types of linear regression-Simple and Multiple. The confusion matrix in Fig. 15 using Linear regression provides who got impacted and who influenced more by the tweets. Other methods like KNN and Random forest also been applied with existing centrality measures and new centrality measures.

The prediction is good in SVM and PCA when compared with the other methods. Because it provides the True Positive and False positive correctly when compared with the others machine learning algorithms. And in SVM every datum is taken into consideration, where as in PCA the data covariance helps in correct Confusion matrix. 

The standards deviation of the various centrality measure are analyzed for the datasets and the values are displayed in Fig. 16a, b. 

By finding of all Centrality values of the network the merging all the centrality into a single.csv the Correlating the values. It is measured as a bivariate examination of the qualities of relationship of the two factors. In measurements, the estimation of the connection coefficient fluctuates somewhere in the range of + 1 and − 1. On the off chance that the value of coefficient might be plus or minus 1, at that point, they can claim it is consummately related. In the event that the coefficient watches out for 0, there is basically no connection among them and it tends to be alluded from the Fig. 17 .

While the parametric factual methodology are all the more impressive as they utilize increasingly fundamental data from the ordinary circulation, the rank correlation is progressively appropriate about the centrality measurements analysis, in light of the fact that the hidden dispersion isn't important to be typical. The estimation of assortativity represents genuine guide to find that rank connection is good than the Pearson value of specific system boundary. The assortativity is a method of estimating blending designs that allude to the degree for hubs to interface with other comparable or various hubs. We frequently begin to inspect the assortativity regarding degree. In other words, that degree assortativity is utilized to respond to addresses of serious extent vertices in a system partner especially of other serious extent vertices or value less one. The assortativity coefficient was first presented by Newman, which is, truth is told, a Pearson relationship coefficient. In any case [32] contended that this estimation endures an issue: for disassortative systems, with the expansion of system size, the coefficient diminishes fundamentally. Moreover, they clarified this issue numerically, proposed another strategy by utilizing rank connection measures, for example Spearman's rho gives good results with its analysis.

The Clustering of the correlation values using the hierarchy based on the close centrality in Fig. 18a and b dendrogram of the values. 

The training score and testing score are nearly same for all the methodologies SVM, PCA1, 2, Logistic Regression, KNN, GaussianNB and RF. And when the RMS value is taken it varies for each method for each. But the SVM, PCA1, 2 show's more accuracy (i.e. 98.8%) than other techniques as pre the Table 1 for the new centrality measures. The different machine techniques applied for the existing centrality measures provides an accuracy of 87% nearly for all as in the Table 5. Tables 4 and 5 values are taken for the COVID-19 dataset's.

The Pearson connection coefficient gauges the straight connection of both datasets. Carefully, Pearson's relationship necessitates that each and every dataset be regularly circulated, and it will not be really concluded with the Zero (0). When compared with other relationship coefficients, it changes between positive 1(+ 1) and negative 1(-1) with 0 inferring no connection. Connections of values positive 1(+ 1) or negative 1(-1) infer a definite straight relationship. Positive connections infer that as x increments do as well; y. Negative relationships infer that as x expands, y diminishes. The p-values is undependable however are likely sensible for datasets bigger than 500 or something like that. 

Kendall's tau defines the proportion of the communication of any two rank's of the found out measures. Qualities near the value 1 demonstrate solid understanding of the values near negative 1 (-1) show solid contradiction and tau-B with ties and tau-A without the ties.

The point biserial correlation define's a second-hand to gauge the association between a twofold x and y which is constant. When compared with other relationship coefficients, this one varies between negative 1 (-1) and positive 1 (+ 1) with 0 implying no correlation. Correlations of negative 1 (-1) or positive 1 (+ 1) imply a determinative relationship. All the three coefficients are compared for the new centralities. The Table 6 provides with the Influence impact plays major role in finding the Influential and Non Influential nodes when compared with Followers centrality and Tweet centrality. All the three coefficients are compared with new centralities, and the Influence centrality is of 0.86 value, followers centrality with 0.86 and tweet also same but with a fractional difference as given in the Table 6 . The Pearson connection coefficient values of the both datasets and the point biserial correlation is a connection between a twofold variable x, and a constant variable y. So the values of the Pearson and the biserial connection seem to the same, because the values are based upon the variables. But in the case of the Kendall's tau coefficient it is based on the ranking values which predicts the influential node and the influence score centrality measure. Based upon the Kendall's tau value the percentile is been fixed as 95. For the Existing centrality features also the values remains the same for Pearson and point biserial, but for the Kendal's tau the value varies for each and every measure. Since the influence impact is necessary for the influential nodes. Other centrality measures are not taken for consideration.

The Power law distribution is done with the In-degree, out degree from the Degree centrality measures and all centralities as per the diagram in Fig. 19 a, b, c. When the new centrality measures are added the value does not deviate more from the power law line.

While preprocessing the data, the nodes and edges are taken into consideration for measuring the centrality measures. When the edges are considered the degree plays a vital role, the indegree represents the follower's centrality measures and outdegree is related to the following centrality measures. So with the existing centrality measures of indegree and outdegree the new measures also helps in finding out the power distribution. The graph clearly states with the diagonal line representing the nodes distribution more for the influential category.

Community detection is done for the popular community among the Covid19 related hash tags and the COVID19 hashtag is taken for the novel features helps us to find the more influential nodes in the popular tags. The community detection algorithm used is with Laplacian Transition matrix with the sentimental analysis of the popular tags. The paper concludes with a method to detect a community and the Influential nodes or users in the Community using Intelligent centrality measure's (User Centrality, Time Series Centrality for the whats app, normal twitter dataset's), the new dataset of Covid-19 using another new centrality measures(Influence Impact, Followers and Tweet centrality measures is done using the existing machine learning algorithms like SVM,PCA,GausianNB,KNN,Linear Regression and RF, it compares the existing features and the new features that are derived. The methodologies SVM and PCA provides the accuracy of 98.6 than techniques for using the new centrality measures and the other scores like RMS,F1.Score, also provides a promising values for the methods. When the same centrality can be applied to other machine Fig. 19 a In degree power law distribution, b Out degree. c All degree distribution algorithm it might provide good results along with the bench mark datasets. As a result finding out the Influential nodes will help us find the Spammy and genuine accounts easily in the future. After the influential nodes are found out from the community, the sentimental analysis can be done based on the hate speeches, polarity and ranking coefficients, the spammy and non spammy nodes.

Funding Not applicable.

Conflict of interest We authors not having any conflict of interest among ourselves to submit and publish our articles in Wireless Personal Communications journal.

Dr. G. Usha is currently working as an Associate Professor at the software engineering department in SRMIST.She has 11 years of teaching experience.While working in Anna University chennai she worked in research projects for Smart and Secure techniques Research Lab. Her research interest include network security, machine learning, Bio informatics. Dr.G.Usha published nearly 40 research articles in peer reviewed journals and international conferences. She is GATE scorer and awarded as college first rank holder in UG. She is editorial board member for the journal Progress of Electrical and Electronic Engineering. She was awarded as Outstanding Reviewer within Top 10 percentile of reviewers in Elsevier-Pattern Recognition Letters in 2017.She is reviewer of Elsevier Journal -Computer and Electrical Engineering, Elsevier Journal-Pattern Recognition Letters,Springer-Multimedia tools and Applications,IEEE Access. She has coordinated IET sponsored Workshop on Cyber Security ,National Workshop on Internet of Things , National work shop on VANET and its securi-tyIET sponsored National Conference on Big data, cloud and Security. She is a active member of IET, ISTE, Indian Science Congress

Centrality in networks: I. conceptual clarification

A new index derived from sociometric data analysis

An input-output approach to clique identification

Factoring and weighting approaches to status scores and clique identification

Complexity in chemistry, "introduction and fundamentals

Ranking spreaders by decomposing complex networks

Centrality and network flow

Spectral measures of bipartivity in complex networks

Scientific collaboration networks II shortest paths, weighted networks, and centrality

A new method of identifying influential nodes in complex networks based on topsis

A modified evidential methodology of identifying influential nodes in weighted networks

The H-index of a network node and its relation to degree and Coreness

Community structure in social and biological networks

Optimised with secure approach in detecting and isolation of malicious nodes in MANET. Wireless Personal Communication

Low rate DDoS mitigation using realtime multi threshold traffic monitoring system

Google's pagerank and beyond: The science of search engine rankings

Rethinking centrality: Methods and examples. Social Networks

Centrality and power in social networks: A game theoretic approach

PCA based dimensional data reduction and segmentation for DICOM images

Identifying influential nodes in large-scale directed networks: The role of clustering

Social influence modeling using information theory in mobile social networks

Searching for superspreaders of information in real-world social media

Identifying influential spreaders by weighted leaderrank

Identifying effective multiple spreaders by coloring complex networks

Finding influential spreaders from human activity beyond network location

Toward link predictability of complex networks

Influence maximization in complex networks through optimal percolation

Entropy and the complexity of graphs. II. The information content of digraphs and infinite graphs

Entropy measures for networks: Toward an information theory of complex topologies

A new method to identify influential nodes based on relative entropy

Detecting streaming of Twitter spam using hybrid method. Wireless Personal Communication

Region centric minutiae propagation measure orient forgery detection with finger print analysis in health care systems

Research and Application of AdaBoost Algorithm Based on SVM

An ensemble model based on adaptive noise reducer and over fitting prevention LSTM for multivariate time series forecasting

Identifying influential nodes based on network representation learning in complex networks