key: cord-0037227-yogjoo2a
authors: Castillo-Garsow, Carlos W.; Castillo-Chavez, Carlos
title: A Tour of the Basic Reproductive Number and the Next Generation of Researchers
date: 2020-02-18
journal: An Introduction to Undergraduate Research in Computational and Mathematical Biology
DOI: 10.1007/978-3-030-33645-5_2
sha: bcac694596763c47ca26a6c1561a7e139bb307db
doc_id: 37227
cord_uid: yogjoo2a

The Mathematical and Theoretical Biology Institute (MTBI) is a national award winning Research Experience for Undergraduates (REU) that has been running every summer since 1996. Since 1997, students have developed and proposed their own research questions and derived their research projects from them as the keystone of the program. Because MTBI’s mentors have no control over what students are interested in, we need to introduce a suite of flexible techniques that can be applied to a broad variety of interests. In this paper, we walk through examples of some of the most popular techniques at MTBI: epidemiological or contagion modeling and reproductive number analysis. We include an overview of the next generation matrix method of finding the basic reproductive number, sensitivity analysis as a technique for investigating the effect of parameters on the reproductive number, and recommendations for interpreting the results. Lastly, we provide some advice to mentors who are looking to advise student-led research projects. All examples are taken from actual student projects that are generally available through the MTBI website.

The Mathematical and Theoretical Biology Institute (MTBI) at Arizona State University is a summer Research Experience for Undergraduates (REU) that has been running every summer since 1996. From 1996 through its 2018 summer program, MTBI has recruited and enrolled a total of 507 regular first-time undergraduate students and 78 advanced (returning) students. To date MTBI students have completed 211 technical reports, many of which have been converted into publications (including, but not limited to, projects used as examples in this paper [4, 30, 38, 41, 57] ).

The program has received external recognition in the form of multiple national awards. The Director of MTBI was awarded a Presidential Award for Excellence in Science, Mathematics and Engineering Mentoring (PAESMEM) in 1997 and the American Association for the Advancement of Science Mentor Award in 2007. Also in 2007, the AMS recognized MTBI as a Mathematics Program that Makes a Difference, and MTBI was awarded a Presidential Award for Excellence in Science, Mathematics and Engineering Mentoring (PAESMEM) in 2011.

A key feature of MTBI is that students start with their own research topics and associated questions [21, 23] . As a result, students often know more about the topic being investigated than the mentors. This means that the mathematical techniques taught at MTBI need to be extremely flexible and well suited to a large population of research problems. In this paper, we give an overview of some of the mathematical techniques that are most commonly used at MTBI, and how they can be put together to form a complete research project.

The MTBI summer program runs for 8 weeks. In the first 3 weeks, students attend lectures in theoretical ecology and various epidemiological-contact-contagion modeling techniques, as used in broadly understood dynamical systems: nonlinear systems of difference and differential equations, discrete and continuous time Markov chains, partial differential equations and agent based modeling. In the final 5 weeks of the program, students form self-selected groups of three to five undergraduates, and investigate a problem of their own choosing. They research the background of the problem, identify a question, construct a model to address the question, analyze the results, and write a technical report describing their project. While the faculty mentors' experience is primarily in mathematical biology, students' interests can be quite diverse [21] . Past projects have included such diverse topics as epidemiology [1, 4, 5, 12, 27, 38, 39, 47, 48] , eating disorders [40, 41] , party politics [13, 57] , prison education [2, 49] , immigration [25] , the menstrual cycle [37] , education [29, 30, 32, 42] , and ecology [28, 45, 55, 56] . Because of the broad variety of student selected topics, students frequently know a lot more about the modeling application than the mentors do. Students take the lead on the project and provide subject matter knowledge, while mentors provide general expertise in mathematical modeling techniques that can be applied to a broad variety of topics.

The most common modeling method in MTBI is systems of ordinary differential equations, with a focus on equilibrium analysis and the basic reproductive number (R 0 ). By far the most popular techniques for reproductive analysis at MTBI have been the next generation matrix (NGM) and sensitivity analysis. These are extremely powerful, accessible, and flexible techniques that can be applied to a broad variety of situations. Since the founding of the REU in 1996 to the present year of 2018, students have written 211 technical reports. Of these reports, 34 [51] .

In this article, we provide an introduction to the next generation method of finding the basic reproductive number (R 0 ), details of how the reproductive number might be checked of accuracy and meaningfully interpreted, and an introduction to sensitivity analysis of R 0 as a way of evaluating the impact of possible interventions. This introduction is followed by examples of some complexities that can arise in using these techniques and how to handle them. All examples come from actual student projects, the full texts of which can be found on the MTBI website [51] . Lastly, we will provide some notes for mentors with an interest in student-led projects.

All together, this paper provides a nearly complete template for project in epidemiological modeling. We do not discuss how to formulate a model here, except for some notes to the mentors, so that part of the template is not complete. Also there are many other techniques that can be included in an epidemiological/contagion research paper that can add further insights into the model. A research project is never really done. However, this paper will provide a guide for turning the model that a student comes up with into a story that includes some results, conclusions, and some recommendations. So that while this research effort might just be the first chapter of the story that you tell about your research interests, it nevertheless has a satisfying ending.

Epidemiologists study the spread and control of diseases, 1 and so the first question they are usually interested in is whether or not it is possible to eradicate/eliminate the disease or whether or not it is possible to prevent the disease from invading a new population. The basic reproductive number is related to responses to both questions. In most, but not all epidemic models, an equilibrium state exists where the disease does not exist. This is a state where the population is made up entirely of uninfected people, that is, the infected populations are zero. It is typically called the disease-free equilibrium.

Biologically, the basic reproductive number (R 0 ) is often interpreted as "the expected 2 number of new infections created by a single (typical) infectious individual in a population that is otherwise completely uninfected." The typical units structure for R 0 looks something like:

We can interpret the left-hand factor infections time·infected as the expected number of people that one person will infect per unit time. When we multiply by (duration of infection), we get the expected number of people that one person will infect over the entire time that they have the disease. Now imagine that a disease is invading for the first time. There is one infected person. If R 0 number is greater than 1, then-on average-infected people more than replace themselves (increase the infected population by more than 1) before they recover, (reducing the infected population by 1) and the disease grows. If this number is less than 1, then-on average-infected people do not replace themselves before they recover and the disease dies out.

However, as we will see in our examples, both this structure and its biological interpretation can be subject to considerable variation. In order to really explore R 0 , we need to establish what it means independent of any particular model. This means we need a mathematical definition.

Practically, R 0 is a number calculated from the parameters of the model that can be used to answer fundamental questions about disease invasion, persistence, and control. When R 0 < 1, values close to the disease-free equilibrium tend toward the disease-free equilibrium making it impossible for a new disease to invade. When R 0 > 1, values close to the disease-free equilibrium tend away from the diseasefree equilibrium making it impossible for a disease to completely die out as long as there is sufficient supply of susceptibles. Examining the conditions under which R 0 changes from greater than one to less than one can suggest possible interventions to control a disease. This gives us enough information to define basic reproductive number mathematically.

We formally define R 0 a function of the parameters of the model such that R 0 < 1 implies that the disease-free equilibrium is locally asymptotically stable, 3 and R 0 > 1 implying that the disease-free equilibrium is unstable. 4

Historically, R 0 was found by directly investigating the stability of the disease-free equilibrium, and this is an approach that is still used today. But for models with a lot of compartments or a lot of non-linearities, a direct analysis of the diseasefree equilibrium can be extremely algebraically intensive. In 1990, Diekmann et al. [33] developed a new approach to finding the basic reproductive number that uses a reduced form of the model. This reduces the computational complexity of the problem of stability and makes complex models more accessible. It is a method that has become very popular with our own students, but tutorials can be difficult to find, so we decided to share this method of finding R 0 here, along with some techniques for interpreting the results.

For an introduction to the techniques used in MTBI in general, we recommend the MTBI course book [9] . Alternative books with similar content include [35] or [54] . For more information on formulating a research question, we recommend [58] . For more details on how to derive a model, we recommend [54] . For more detailed treatment of the next generation method, see [33, 34] . For a deeper dive into sensitivity analysis, see [3] . For a full collection of MTBI tech reports, see [51] . For further reading into the design of MTBI's mentorship model, we recommend reading [14, 18, 23, 24] .

Katie Diaz, Cassie Fett, Grizelle Torres-Garcia, and Nicholas M. Crisosto were the authors of the project we will use in our first example [32] . While most students at MTBI take on ecology or biology projects, these students were particularly interested in problems related to the quality of education in the USA. They worried about students feeling discouraged by working in stressful school conditions with little teacher support. As budget cuts started to affect class sizes, they were concerned that problems with student attitudes and dropping out would increase. So they designed a model to investigate the problem of how student-teacher ratio in classes affects class sizes. In this model, Diaz, Feet, Torres-Garcia, and Crisosto modeled a discouraged attitude as something that both students and teachers could have. By social interaction, teachers and students could both pass on (metaphorically infect others with) this negative attitude.

This student project example [32] is 2003, when the NGM and sensitivity where first becoming popular with MTBI students and mentors. It received a poster award at the AMS/MAA Joint Meetings in 2004. The project uses a relatively simple model to study the development of positive or discouraged attitudes in the interactions between students and teachers, and tells the full story of using the NGM to find R 0 , interpreting the expression of R 0 , and using sensitivity analysis of R 0 to explore the effect of model parameters on the behavior of the system. Because the model is relatively simple, this project serves as an excellent introductory example. We will explore possible complications that you might encounter in your own project with later examples. Like all student project examples used in this paper, the full text of the students' technical report can be found on the MTBI website [51] , and we strongly encourage students to review the original reports and related materials.

The researchers constructed a model with four classes: Teachers with positive attitudes P 1 , discouraged teachers D 1 , students with positive attitudes P 2 , and discouraged students D 2 . They assumed that each group influenced every other group through a random mixing contagion process, resulting in the following system of four differential equations 5 :

where μ i represents the rate of flow of group i into and out of the school, q i is the proportion of new recruits into group i with positive attitudes, λ j is the rate at which a discouraged person is encouraged by an interaction with a positive person from group j , and β j is the rate at which a positive person is discouraged by an interaction with someone with a discouraged person from group j . Because the total population of each group is constant (Ṗ i +Ḋ i = 0; i = 1, 2), the researchers chose to define the classes as proportions rather than populations, and set P i + D i = 1; i = 1, 2.

The next generation matrix method for finding R 0 is a relatively recent and useful reformulation for determining the stability of a disease-free or contagion-free equilibrium, that was introduced by Diekmann et al. [33] . First we will explain the general strategy that inspired this approach to finding R 0 , and then we will describe precisely how to carry out that strategy mathematically, using the studentteacher model as an example.

Intuitively, the next generation method is based on the common structure of R 0 described above (Eq. (1)). To aid in the discussion, we have reproduced that structure here.

In most models, the duration of the infection (units of time) is represented as a recovery rate (units of 1 time ). For example, in the simple differential equation(I ) = −μI , the expected time that an individual would stay in I is 1 μ . If we also observe that the left factor of (3) represents the rate at which individuals infect others, then we can rewrite Eq. (3) as

The next generation method extends this idea to matrices. In situations when there are multiple types of infection to keep track of, the NGM separates the infected system into two matrices of rates. Traditionally, these matrices of rates are called F and V :

However, the R in this equation is a matrix, not a number. So it cannot represent the basic reproductive number. Instead it represents a sort of measure of how each class changes multiplicatively to the next generation. If we imagine a vector of infected classes Y , then near the equilibrium

In order to explore how the size of Y changes, we use the eigenvector equation. 6 If Y is an eigenvector then:

We see that when |λ| < 1, then Y decreases. Since Y represents infected classes, this means the infection dies out. When |λ| > 1, then Y increases and the disease grows.

In order to be sure the disease dies out, we need all eigenvalues |λ| < 1. If any eigenvalue is greater than 1 in absolute value, then there is a path for the disease to grow. This means we only need to check the eigenvalue that is largest in absolute value (called the dominant eigenvalue). If this dominant eigenvalue is in (−1, 1), all eigenvalues are within (−1, 1) and the disease dies out. If the dominant eigenvalue is greater than 1 in absolute value, then the disease can grow. So the dominant eigenvalue has the properties we are looking for in a basic reproductive number.

Formally, the next generation method begins by separating the classes of the model into two vectors: a vector X of "uninfected" classes and a vector Y of "infected" classes. In this model, the infection process is bi-directional. From one point of view, positive people are "infecting" discouraged people with positivity. From another perspective, discouraged people are "infecting" positive people with discouragement. Given the choice, the researchers chose to think of discouragement as the infection. In Exercises 1 and 2 you will have the opportunity to show what happens if you make the opposite choice.

The researchers chose discouragement as the infection, so X = [P 1 , P 2 ] T , and Y = [D 1 , D 2 ] T . Now that we have defined the infection, we proceed to find the infection-free equilibrium. In this case, a discouragement-free equilibrium does not exist unless q 1 = q 2 = 1, so this becomes a condition of the model. In the scenario where all people arrive with positive attitudes, then the discouragementfree equilibrium is: X * = [1, 1] T , and Y * = [0, 0] T . We will also define a point for the discouragement-free equilibrium that includes all four classes W * = (P * 1 , D * 1 , P * 2 , D * 2 ) = (1, 0, 1, 0). The main advantage of the NGM method is that it allows the researcher to ignore any uninfected classes and focus only on the infected classes. This reduces the complexity of calculations in the model. In this model, X is discarded and the new equation for the model iṡ

The next step is to separate the Y equation for discouraged classes into two separate rate vectors: F represents the rates of all flows from X to Y , and V represents rates of all other flows. We also adjust signs so thaṫ

In the student-teacher model, the separation iṡ

Note that although the terms μ 1 (1 − q 1 ) and μ 2 (1 − q 2 ) represent rates of new infections, they are not rates flows from X to Y, but instead rates for flows from outside the system into Y. Because of this, these terms are included in V and not F . However, these terms are also problematic because they prevent a discouragementfree equilibrium from existing. Because q 1 = q 2 = 1 is a necessary condition for the discouragement-free equilibrium to exist, we set q 1 = q 2 = 1 and V becomes

Next we define F and V as the Jacobian matrices of F and V evaluated at the discouragement-free equilibrium W * = (P * 1 , D * 1 , P * 2 , D * 2 ) = (1, 0, 1, 0)

So for the student-teacher discouragement model

This allows us to calculate our next generation matrix and the basic reproductive number.

where F is the Jacobian of the rates of flows from uninfected to infected classes evaluated at the disease-free equilibrium, and V is the Jacobian of the rates of all other flows to and from infected classes evaluated at the disease-free equilibrium (Eqs. (9) and (14)).

The basic reproductive number (R 0 ) is the spectral radius (largest eigenvalue) of the next generation matrix: R 0 = ρ(F V −1 ) [33, 34] .

So in the student-teacher model, the next generation matrix is

and the basic reproductive number is

Exercise 1 If you interpret a positive attitude as the infection instead of discouragement, how does the form of the resulting R 0 change?

Traditionally, the basic reproductive number (R 0 ) is interpreted as "the average 7 number of new infections created by a single (typical) infectious individual in a population that is otherwise completely uninfected." If R 0 < 1, then an infection does not replace itself, and the disease dies out. If R 0 > 1 then the infection more than replaces itself and the disease spreads. An important feature of this interpretation is that by describing R 0 as people per person or infections per infection, R 0 is a dimensionless quantity. We can use this feature as a way to check our result in Eq. (18) .

Since P 1 , P 2 , D 1 , and D 2 are proportions, they are also dimensionless. Examining Eq. (2), we see that in order for units to match on both sides of the equation, this means that λ i , β i , and μ i must all have units of 1 time . Then in Eq. (18), we see that R 0 has units of 1/time 1/time and so is dimensionless. This does not mean that we necessarily calculated R 0 correctly, but it is a good safety check.

The basic reproductive number is interpreted in a population that is mostly susceptible, meaning that it must be interpreted in situations very close to the disease-free equilibrium. When we compute the Jacobian around the disease-free equilibrium, we are approximating the original model with an easier to interpret linear model. In short the analysis is local and mathematically speaking we are talking about local asymptotic stability.

In the case of the student-teacher discouragement model, this gives us

Linear models are relatively simple to interpret in the context of populations. In the contexts of proportions the interpretation is a little bit more complex. So for now let us imagine that the classes D 1 and D 2 represent populations, and we will correct for proportions afterwards.

In population models, λ 1 +λ 2 +μ 1 can be interpreted as a general "death rate" for the D 1 (discouraged teacher) class. Since average lifespan is the inverse of the exit rate in a linear model, 1 λ 1 +λ 2 +μ 1 is the average length of a single episode of teacher discouragement. Similarly 1 λ 1 +λ 2 +μ 2 is the average length of a single episode of student discouragement. Similarly β 1 and β 2 can be interpreted as birth rates. But β 1 and β 2 are both used as inputs into both equations. So 1/β 1 is the average length of time it takes for a discouraged teacher to create two new discouraged people: a new discouraged student and a new discouraged teacher, and 1/β 2 is the average length of time for a discouraged student to create a new discouraged person and a new discouraged teacher. 8 Putting these interpretations together, we have that 1/(λ 1 +λ 2 +μ 1 )

1/β 1 is the average time that it takes a teacher to get discouraged divided by the average time it takes a teacher to discourage a new student-teacher pair. This then calculates the average number of student-teacher pairs that a single teacher discourages. Similarly β 2 λ 1 +λ 2 +μ 2 is the average number of student-teacher pairs that a single student discourages. Put together we have that R 0 is the average number of new discouraged student-teacher pairs created by a discouraged teacher and a discouraged student.

There are two problems with this interpretation, and both problems are tied to the idea that D 1 and D 2 are proportions not populations. Our interpretation of R 0 is based on the idea of a single discouraged teacher and a single discouraged student. But D 1 = D 2 = 1 is not the case of a single teacher and a single student. Instead these are the cases of the entire populations being discouraged. So there are two problems with this. The first problem is that D 1 = D 2 = 1 represent different numbers of teachers and students, because the total populations are different. The second problem is that in order for us to interpret R 0 , we need the population to be mostly uninfected, which means that D 1 and D 2 need to be small. So we need an interpretation that is meaningful for D 1 and D 2 as proportions, with D 1 = D 2 1. So an overall better interpretation of this basic reproductive number is that it measures the combined percentage of discouraged teachers and percentage of discouraged students produced by a small percentage of discouraged teachers and equivalent small percentage of discouraged students throughout the duration of time that those small percentages are discouraged. For example, if R 0 is 2, then we could interpret this as saying that when the percentage of discouraged people is small, the percentage of the next generation of discouraged people will be approximately twice as big.

This structure of calculating the number of new infections per infection as infection rate output rates is a very common structure that you will see in a lot of basic reproductive numbers. So it is important to remember both in interpreting your R 0 and in using that interpretation to check your calculations.

Continuing from Exercise 1: If you interpret a positive attitude as the infection instead of discouragement, how does the interpretation of the resulting R 0 change?

Since the basic reproductive number models whether an initial infection spreads or dies out, researchers typically want to know the effect of the parameters of the model on R 0 . The problem is that the model typically has a significant number of parameters. This makes it difficult to do a full exploration of R 0 . For example, we typically cannot plot R 0 as a function of every parameter in N-dimensional space.

There are a few solutions to the problem. If there are only one or two parameters that can be controlled by policy, and the other parameters have known values, then a typical approach is to create a graph of R 0 as a function of those controllable parameters, and look at the contour where R 0 = 1. Sensitivity analysis is a different approach that can be used to efficiently examine many parameters at once by focusing on local changes in the parameter values [3] . When parameter values are unknown, sensitivity analysis can sometimes provide some useful general results.

When parameter values can be estimated, sensitivity analysis can be used to suggest an intervention.

The key idea of sensitivity analysis it to look at how a small percentage change in one parameter affects the corresponding percentage change in a quantity of interest (in this case, R 0 ). So for example, a sensitivity index of 2 would mean that for a very small increase in the parameter, R 0 increases by twice that percentage. A sensitivity index of −0.5 would mean that for a very small increase in the parameter, R 0 decreases by half that percentage.

The reason why sensitivity analysis uses percentages rather than bare changes is so that different parameters with different units can be compared on an even playing field. If a parameter is measured in miles, a change of 0.0001 is much more dramatic a difference than if the parameter is measured in feet. But a change of 0.01% would be the same regardless of unit.

To calculate the percentage change in the parameter, we calculate the percentage change in the parameter ξ as Δξ ξ . Similarly, the percentage change in R 0 would be ΔR 0 R 0 . The sensitivity index of R 0 with respect to the parameter ξ is then just the quotient of these two percent change, as long as the percentage change is sufficiently small. The way we make sure that the change in percentage is small enough is by taking a limit, so that Δξ becomes ∂ξ and ΔR 0 becomes ∂R 0 .

The sensitivity index S ξ of R 0 with respect to parameter ξ is given by Eq. (21):

In the student-teacher discouragement model, the researchers calculated the sensitivity of R 0 with respect to each parameter.

Because all parameter values are positive and R 0 is positive, the sensitivity indices S β 1 and S β 1 are positive, while the indices S λ 1 , S λ 2 , S μ 1 , and S μ 2 are negative. This gives us an expected result. Increasing the rates at which discouraged teachers and teachers successfully convert other students and teachers (β i ) increases to growth in discouragement R 0 . Increasing the rate at which positive teachers and students convert other students and teachers (λ i ) reduces R 0 . What might be less expected is that increasing the rate of student and teacher turnover (μ i ) also decreases R 0 . This is because of the assumption that q 1 = q 2 = 1 that was needed to find an R 0 . Because all new people entering the school have positive attitudes, turnover removes discouraged people and replaces them with positive people.

Next we can look at which sensitivity indices have the largest impact on the system. Because

So changing the rate that discouraged teachers or students recruit always has a stronger impact than the same percentage change in the corresponding turnover rate. Because λ 1 + λ 2 < λ 1 +λ 2 +μ i we have that |S β 1 +S β 2 | > |S λ 1 +S λ 2 |, which suggests that targeting the spread of negative attitudes overall is more effective than targeting positive attitudes overall.

Continuing from Exercises 1 and 2: Use sensitivity analysis to investigate the effect of the model parameters on the spread of positive attitudes and make a recommendation.

The researchers also used parameter estimates to calculate the sensitivity indices numerically, although the exact values they used are not included in the original paper. Using these parameter estimates they found that in order from largest to smallest absolute value S β 2 = 0.5959,

002159. This means that in a school that would be modeled by these parameter estimates, the effects that students have on the attitudes of teachers and students (β 2 and λ 2 ) have the strongest impact on the spread of discouragement, but the impact of teachers (β 1 and λ 1 ) is not that much less. Turnover (μ 1 and μ 2 ) has little impact in this scenario. Which interventions the researchers recommend, however, would depend on other factors, such as how difficult or costly it could be to alter a particular parameter.

The full process of analysis in this example was: (1) Use the next generation matrix to find the basic reproductive number, (2) interpret the basic reproductive number as a way of verifying the calculation, and (3) use sensitivity analysis to investigate the effect of parameters on the model and make recommendations for intervention. The student-teacher model [32] is one full example of this process with a small model. Because details necessarily had to be omitted, we highly recommend that you read the original 2003 paper at https://mtbi.asu.edu/tech-report.

Other MTBI student papers that give detailed examples of the full process are:

• A 2012 study on HIV and malaria co-infection [4] • A 2012 study on prison reform [2] • A 2007 study on HIV and Tuberculosis [27] • A 2006 study on the economics of sex work [31] As you are reading some of the cited papers and technical reports, you will notice that they do not always use notation in the same way. While the calculations for F and V are always the same, notation for intermediate steps can vary. Some researchers define X as the infected class, and Y as the uninfected class, so that X = F − V . Some researchers do not define their uninfected X and infected Y classes explicitly, and you have to identify how they separated their classes from the F and V expressions. Other researchers will define F and V for all the classes in the model (in the student-teacher model F and V would be 4 × 4 instead of 2 × 2) but still only calculate the Jacobian for the infected classes, so the resulting F and V are still the same (2 × 2) matrices as above. Regardless of intermediate steps and choice of notation, F and V are used consistently as in Eq. (14) and Definition 2.

No paper ever includes all the details of the mathematics. Choose one of the papers above and reconstruct the missing steps in the NGM method. Define X, Y , F , and V . Then take the Jacobian to find F and V , calculate the next generation matrix F V −1 , and find the spectral radius R 0 . Verify your results with the results from the paper.

No paper ever includes all the details of the mathematics. Choose one of the papers above and find their reproductive number, then calculate the sensitivity indices for each parameter included in the reproductive number. If the paper includes detailed parameter estimates, then use these estimates to also calculate the sensitivity indices numerically. Use the parameter definitions given in the paper to interpret the sensitivity indices and suggest a possible intervention. Verify your results with the results from the paper.

Because we introduced these techniques with a simple model, we have really only spent time with a "best case" scenario. Since your own project will likely involve a more complicated model, this also means that it is likely that a situation could arise where you are not quite sure what the best decision is. In the next few sections, we will discuss some complications that can arise when using these techniques to study a model of your own creation. We will discuss each of these scenarios briefly using examples from actual student projects, with a focus on the decisions that needed to be made. As always, we encourage you to read the original papers [51] as a way of filling in some of the missing details. And you should always consult your research mentor about complications in your own model.

In this first section, we will discuss some complications that can arise when finding and using a next generation matrix. Most of these complications revolve around decisions related to how to define the uninfected (X) and infected (Y ) classes.

This complication arose in research project from 2018 on the spread of herpes [1] . Herpes (HSV-2) is a disease that "hides" in the nervous system of the host, and only periodically emerges to cause symptoms on the skin. Current treatments can force the disease into the asymptomatic stage, but cannot eliminate the disease from the body entirely, so it always returns. When the disease emerges, initial symptoms resemble a lot of other diseases, while later symptoms are distinctive. Current protocols only administer treatment when the disease has progressed to the later stage, and the patient has already been infections for some time. The question the Luis Almonte-Vega, Monica Colón-Vargas, Ligia Luna-Jarrń, Joel Martinez, and Jordy Rodriguez-Rincón, wanted to study was the cost effectiveness of treating during the early stages when the disease is less infectious, but money would be wasted on treating false positives. Could spending extra on treating people who only might have herpes save money in the long run by reducing the prevalence of the disease?

The model the students constructed had seven population compartments: S for susceptible, I 1 for mildly symptomatic individuals, I 2 for strongly symptomatic individuals, L for asymptomatic individuals, T 1 for individuals with mild symptoms undergoing treatment, T 2 for strongly symptomatic individuals undergoing treatment, and X for false positive undergoing treatment.

During the project, the question arose of how to separate the infected and uninfected classes for the next generation matrix. To illustrate, let us consider a simple SIR model with partial immunity. Susceptible people become infected, infected people recover, and recovered people have a chance of being infected again.

So the flows would be S → I ↔ R. In this scenario, the only population that has the disease is I , so the separation of classes for the NGM would be X = [S, R] T uninfected and Y = [I ] infected.

But there could be another interpretation of this diagram where the disease is never fully eradicated. In the case of herpes, the asymptomatic stage resembles recovery, but it is really the same infection. So the structure of the model S → I ↔ L. In this scenario, the both I and L have the disease, so the separation of classes for the NGM would be X = [S] uninfected and Y = [I, L] T infected. The key distinction here is that the flow from L → I in the SIL model is considered a new stage in the same infection, while the flow from R → I in the SIR model is a completely new infection.

So in the HSV-2 model, the students separated the classes into [S, X] T uninfected, and [I 1 , I 2 , L, T 1 , T 2 ] infected. L, T 1 , and T 2 were included because even though the people in these compartments had no symptoms and were not contagious, they still carried the disease in their bodies, so they were still infected.

We will discuss how to interpret the reproductive number from this model in Sect. 4.4.

This complication arose in a research project from 2001 on collaborative learning [29, 30] , and another research project from 2005 on the success of political third parties [13] . The 2005 paper on third parties is more detailed, so it is the one we will discuss here, and also the one we recommend that you read first.

Karl Calderon, Clara Orbe, Azra Panjwani, and Daniel Romero were inspired by the 2000 election, when election analysts attributed the defeat of Al Gore to splitting the base. Ralph Nader, a Green Party candidate who won 2% of the vote, which many believed could otherwise have gone to Gore and turned the election. Using the growth of the Green Party as a source of data, the researchers wanted to study the growth of grassroots political movements, and explore voter recruitment strategies. The third party model had a relatively simple structure, where the researchers considered three levels of engagement. A population S of people who were susceptible to the messages of the political party, a population V of people who voted for the third party, and a population M of active members of the party. Both voters and members could convince susceptible to become voters, although members were more effective than voters. Also members could recruit voters to become members. Lastly, voters could change their mind. So the structure of the model was S ↔ V → M.

This model was interesting because it had two processes that could be considered an infection: the recruitment of voters from S → V by voters and members, or the recruitment of members from V → M by members. In the first case, where voting is the infection, the separation of classes would be X = [S] uninfected, and Unlike the case of HSV-2 above, where the researchers needed to make a decision to identify the correct separation, here there is not correct separation. The researchers needed both perspectives to understand the behavior of the model. So this model did the NGM process twice (once with each separation) and had two reproductive numbers.

We will discuss how to interpret these reproductive numbers in Sect. 4.3.

The collaborative learning paper [29, 30] has a very similar model to the third parties paper [13] , but their NGM section is extremely light on details. Use the model from the collaborative learning paper and the NGM method to find both reproductive numbers for this model. Verify your results with the results from the paper.

Another complication that can arise is with the definition of R 0 as the spectral radius of the NGM. The spectral radius is the largest eigenvalue, but sometimes it is not clear which eigenvalue of the NGM is largest. This is a common problem when studying the interaction between two diseases that can co-infect. Two projects that encountered this problem were a 2012 paper on HIV and malaria co-infection [4, 5] , and a 2007 project on HIV and tuberculosis co-infection [27] . Both papers are very similar. We will use the malaria model for this example. Kamal Barley, Sharquetta Tatum, and David Murillo were inspired by the high prevalence of both HIV and malaria in the Republic of Malawi. Both diseases increase the impact on the other. HIV weakens the immune system, which makes it easier for malaria to infect people, and malaria increases the viral load of HIV. In particular the researchers were interested in studying how the interactions between these two diseases might increase mortality.

In 

In this matrix, subscript H is for HIV, M for malaria, and H M for both. Λ is recruitment rate for humans, N is the total population of humans, the βs are infection rates, μs and α are mortality rates, γ is the recovery rate from malaria, and k is the reduction in this recovery rate due to HIV infection. This model has two eigenvalues that could both be largest, depending on parameter values:

So the basic reproductive number is whichever of these two eigenvalues happens to be the largest.

This is a very common structure for R 0 in the case of co-infection. We will discuss how to interpret this result in Sects. 4.2 and 4.5.

The malaria paper [4] also has a reduced model with four compartments. Read the paper to identify the reduced model, and find R 0 for the reduced model by using the NGM method. Verify your results with the results from the paper.

We have already discussed one scenario that complicated the interpretation of R 0 : the situation where we are adding proportions instead of populations. However, the overall the structure of the basic reproductive number in the student-teacher model followed a relatively simple infection rate output rates structure that is common to many basic reproductive numbers. In this section, we will discuss other forms that R 0 can take that commonly appear in student projects, and how to interpret those structures. In order to keep these sections short, we will primarily be using flow diagrams to discuss the models. As always we strongly encourage you read the original papers for more details on the models and analysis. Furthermore, every paper cited in this section also used a next generation approach to finding the basic reproductive number, so these are all good sources if you want additional examples of that approach.

This example project is from 2004 on the spread of SARS [38, 39] . Julijana Gjorgjieva, Kelly Smith, and Jessica Snyder were inspired by the prominent news coverage of the 2002 SARS epidemic. In particular, they were interested in the control plan for SARS. Because the disease was new and there was no treatment, control focused on isolating people who showed symptoms, and tracing their contacts to identify other people who might have the disease. Since no treatment had been developed, the researchers saw an opportunity to compare the current tracingand-isolation control plan to a possible future vaccination strategy.

The control plan for SARS involved tracing an infected individual's contacts in order to identify who else was at risk of contracting SARS, but this tracing process was not always successful. Treatment for SARS generally involved isolation to prevent further spread. So the model used classes for S susceptible, E i traced latent, E n untraced latent, I infectious, W isolated, R recovered, and D dead. This resulted in a model where an individual could take many different paths. For example, an individual could never be identified by doctors and pass through S → E n → I → D (Fig. 1) , or an individual could be caught in the infectious stage and pass through isolation and treatment and survive, passing through S → E n → I → W → R (Fig. 1) . Fig. 1 Flow diagram for the SARS model with contact tracing: S susceptible, E n untraced latent, E i traced latent, I untraced infectious, W diagnosed and isolated infectious, D dead, and R recovered [39] . β is the infection rate, k is the rate of developing symptoms, θ is the probability that the patient is diagnosed while latent, α is the rate that infected individuals become isolated, δ is the death rate due to infection, and γ s are recovery rates Using the next generation approach, the researchers found the following basic reproductive number 9 :

This R 0 follows the standard infection rate output rates pattern, but because there are multiple pathways to get to I or W , these rates need to be averaged together by the proportion of individuals who follow each path. Examining the flow diagram (Fig. 1) , the only path to I is S → E n → I . Individuals are infected at a rate β. Of these individuals, a proportion (1 − ρ) enter E n , and of these individuals in E n , a proportion (1 − θ) enters I . Looking at the outflows of I : average time in I is 1 α+δ+γ 1 , so the infection rate output rates for I would be β α+δ+γ 1 , weighted by the proportion of infected individuals who enter

, which is what we see in Eq. (26) .

W is less infectious than I , so the rate of those infected by W is βl < β. All the terms with l in the numerator then involve pathways to W . Let us examine the second term, βl(1−ρ)(1−θ)α (δ+γ 2 )(α+δ+γ 1 ) . In this term, βl (δ+γ 2 ) is infection rate output rates for W . (1 − ρ) is the proportion moving from S → E n . (1 − θ) is the proportion moving from E n → I . Lastly, α (α+δ+γ 1 ) is the rate from I → W divided by the total rate out of I , so it is the proportion of individuals moving from I to W . So this term represents the infection contribution from the S → E n → I → W path.

Similarly the first term βl(1−ρ)θ δ+γ 2 represents the infection contribution from the S → E n → W path, and the final term βlρ δ+γ 2 represents the infection contribution from the S → E i → W path.

Imagine that each of the proportions in this model is instead a probability. Combine each of these four paths together to create a probability tree diagram. Interpret R 0 as the expected value of infection rate output rates for a single individual passing through the system at random.

This example project is from 2007 on the spread of HIV and tuberculosis [27] . Like the malaria project above, Diego Chowell-Puente, Brenda Jiménez-González, and Adrian Smith were interested in studying how two deadly diseases interacted in South Africa, where both are common. TB is often carried latently, and HIV [27] increases the chance of TB developing symptoms. So the researchers wanted to construct a model that would suggest strategies to control both epidemics. Tuberculosis has a latent stage, during which a person is infected but uninfectious. HIV has an asymptomatic infectious stage, and the symptomatic stage, when the infected person has developed AIDS. In order to track the behavior of a population, the researchers needed to develop a model that captures every possible combination of stages, as seen in Fig. 2 . Note that the infection terms need to be summed over a large number of compartments, because a large number of different compartments all carry the same disease. For example, H , H L , H I , and A all carry HIV and can infect a susceptible with HIV, so the HIV infection rate must include all of these compartments.

Like most co-infection models, the basic reproductive number for this model included a maximum.

In order to interpret this reproductive number, let us examine each component using the tools from Sect. 4.1. Beginning with R T B 0 , we see that the first factor β 1 γ 2 +μ+τ is the infection rate for TB divided by the sum of the rates out of I . So this factor takes the form infection rate output rates for TB by itself. The second factor k γ 1 +k+μ takes the form of rate from L to I divided by total rate out of L. So this is the proportion of people in L who enter I . Taken together, this is the basic reproductive number for the S → L → I pathway by itself. This would be the path that TB takes if there were no HIV.

Similarly, R H I V 0 is the reproductive number for HIV by itself. β 2 ω+μ is infection rate output rates for the H class, β 2 3 μ+σ is the infection rate output rates for the A class, and ω ω+μ is the proportion of people moving from H → A.

The basic reproductive number for the whole system is the maximum of these two reproductive numbers because R 0 only tracks infected or uninfected. Once a person in infected with either of the two diseases, they are counted as infected. Being infected with both involves first being infected with one, and then the other. Because R 0 is only concerned with the disease-free state, that second co-infection is not relevant. Only the first infection affects the disease-free state, so in examining the disease-free state, we are only concerned with whichever infection is strongest.

For another example of this type of co-infection R 0 , see the HIV-malaria model [5] discussed in Sects. 3.3 and 4.5.

In this section we will return to the 2005 project on third parties from Sect. 3.2. In this project [13] , there were three classes of population: S susceptible, V voters, and M members. Because either voting or membership could be considered the infection, there were also two reproductive numbers:

In this model (Fig. 3) , when a susceptible encounters a voter, there are two possible outcomes. Either the voter can influence the susceptible to become a voter (S → V ), or the susceptible can convince the voter not to vote (S ← V ). The former happens at rate β and the latter occurs at rate φ, so the net rate of people being influenced to vote is β − φ, and this becomes the infection rate, while μ + is time in V . So R 1 follows our standard pattern of infection rate output rates . For R 2 , γ is the infection rate, and μ is the exit rate. Following our pattern from Sect. 4.1, the remaining factor 1 − 1 R 1 should be some sort of proportion . In other words, it is the proportion of voters who are currently convincing a susceptible to vote. Then 1 − 1 R 1 is the proportion of voters who are not currently occupied convincing someone else to vote, which means they have free time to interact with a member and be susceptible to membership.

An interesting feature of this two threshold model is that R 1 only measures the impact of voters on susceptibles, but members can also convince susceptibles to vote. This means it is possible for members to exist as a population when R 1 < 1, as long as enough members exist in the population. This is known as a backward bifurcation. Also note that when R 1 < 1, R 2 < 0. This negative value for R 2 is a little difficult to interpret, but it is occurring because members are recruiting susceptibles to become voters and then almost immediately recruiting those voters to become members, so voters have no time to recruit susceptibles as voters. This maintains the member population without maintaining the voter population.

In this section, we continue the discussion of the 2018 herpes model [1] . A key feature of herpes discussed in Sect. 3.1 is that a single infection goes through multiple cycles of infectiousness. This leads to a potentially infinite number of paths to I: S → I , S → I → L → I , S → I → L → I → L → I , etc. Each of these paths is still a single infection, so each path needs to be considered in the basic reproductive number. In order to study this phenomenon, the researchers started with a simplified model before moving on to their full model. The simplified model only has compartments for S susceptible, I infected, and L latent (Fig. 4) . The basic reproductive number for this simplified model was: Fig. 4 Flow diagram for the simplified herpes model: S susceptible, I infectious, and L latent [1] In this basic reproductive number, β is the infection rate, divided by μ + γ is the time in I , so this forms our standard basic reproductive number. Our previous examples have suggested that the remaining factor should be a proportion. γ μ+γ is the proportion of people who leave I to enter L, and r μ+r is the proportion of people who leave L to return to I . So the product p = γ μ+γ · r μ+r is the proportion of people in I who make a single round trip back to I (I → L → I ).

Because people can make this round trip any number of times, we need to account for all possible pathways. Infection for people who make zero round trips would be β μ+γ , from one round trip would be β μ+γ p, two round trips: β μ+γ p 2 , three round trips: β μ+γ p 3 , etc.

Because the limit of a geometric series with common ratio p is 1 1−p , we have the form of R 0 given in Eq. (29) .

Other student projects with examples of this kind of looping R 0 include a 2018 study of prison recidivism [49] and a 2010 study of immigration [25] . Choose one of these two projects, describe a model, use unit analysis to identify the units of each parameter, verify that the reproductive number is unitless, and interpret the reproductive number using the techniques you have learned so far.

The project is from 2005 on the spread of HIV between two sexes: male truck drivers and female sex workers [47] . Titus Kassem wanted to focus on the interaction between two populations that were both at high risk of HIV in Nigeria. In particular, truck drivers travel frequently and for great distances and interact with sex workers in many different places. High numbers of sexual partners and travel lead to a disproportionally large influence on the geographic spread of HIV. So Kassem wanted to study these two at-risk groups, and how their different ideas about condom usage affected the spread of HIV.

The model used six classes, S m susceptible males, I m HIV infected males, A m males with AIDS, and similarly for females. The paper did not provide a flow diagram, so instead, we duplicate the equations here.

Exercise 10 Construct a flow diagram for this HIV model. Label each arrow with the corresponding term from Eq. (31).

The researchers found that the basic reproductive number was:

The paper has omitted the details of these steps. Use a NGM to verify the basic reproductive number found in Eq. (32) .

The basic reproductive number has an interesting form of the geometric mean between two values: The reason for the geometric mean is because the disease has to make a round trip. In order for a male to infect another male, the disease must first infect a female, and vice-versa. So the number of new infected males produced by a single infected male is the number of infected females produced by that infected male, times the number of infected males produced by each new infected female. For example, if a male infects 8 females, and each female infects 2 males, then the number of newly infected males produced by that original male would be 8 · 2 = 16. But this is a two-step process. In order to scale the process back to a single infection step, we need to average the two values. Because the process is multiplicative, we average with a geometric mean instead of an arithmetic mean. Note that √ 8 · 2 = 4 and 4 · 4 = 16, while an arithmetic mean does not quite work: 8+2 2 = 5 and 5 · 5 = 25. Flow diagram for the math anxiety model: P for primary student, S n for non-anxious secondary student, S a for anxious secondary student, T n for non-anxious teacher, and T a for anxious teacher [42] Exercise 12 Return to the HIV-malaria model [5] 

This example comes from a 2017 study on the spread of math anxiety in a school [42] . Arie Gurin, Guillaume Jeanneret, Meaghan Pearson, and Melissa Pulley were inspired by their own past struggles with mathematics, and concern for the impact that math anxiety has on the recruitment and retention of women and minorities into STEM. Because many women with math anxiety become elementary school teachers who teach mathematics, the researchers wanted to study how generations of teachers might influence students attitudes toward mathematics as those students become teachers themselves. The researchers were looking to identify a best point of intervention to break the cycle of teachers with math anxiety creating new generations of teachers with math anxiety. The model had five classes: P for primary student, S n for non-anxious secondary student, S a for anxious secondary student, T n for non-anxious teacher, and T a for anxious teacher (Fig. 5 ).

In this model, the basic reproductive number was

where R S represents the typical number of new anxious students recruited by a single anxious student (S n → S a ), R T represents the typical number of new anxious teachers recruited by a single anxious teacher (T n → T a ), R CT S represents the typical number of new anxious students mentored into becoming anxious teachers by a single anxious teacher (S n → T a ), and R CST represents the typical number of new anxious students recruited by a single anxious teacher (P , S n → S a ).

Interpreting the reproductive number requires two useful mathematical facts:

Applying these to the form of R 0 we have

R S and R T are within generation effects, and the form of R 0 tells us that either of these is strong enough by itself to support an epidemic of math anxiety. If enough students recruit their peers, or enough teachers recruit their peers, then anxiety is sustained.

The geometric mean term represents a generational effect that takes place in two paths: in the first path, an anxious teacher mentors a non-anxious student into becoming an anxious teacher (S n → T a ). In the second path, an anxious teacher mentors a non-anxious student into becoming an anxious student (P , S n → S a ). These are the two ways that an anxious teacher can pass on their anxiety to the next generation.

However, this is not a true geometric mean. Unlike the HIV model, the two paths do no share endpoints, so this is not an average of two events happening sequentially. Instead what is happening is that neither of these paths makes a full cycle. One path is student to teacher, and the other path is teacher to student. So each represents only a half-step of infection and not a full teacher to teacher or student to student cycle, so each needs to be square-rooted.

Lastly, note that if there are no peer-pressure effects (R S = R T = 0), then the generational effects by themselves may not be enough to sustain the epidemic.

However, generational effects can create an epidemic where peer-pressure by itself might not be enough (R S = R T < 1). This is not a straight sum because the two methods of spread (peer-pressure and generational) are competing with each other. For example, when teachers recruit more primary students to become anxious secondary students, it creates fewer non-anxious secondary students to be recruited by their peers so the peer effect is weaker. Another example is when anxious students recruit more non-anxious students through peer-effects, it leaves fewer nonanxious students to be recruited though mentoring effects.

Other student projects with similar forms for R 0 include a 2001 study on bulimia [40, 41] and 2018 study on hospital screening for MRSA [12] . Choose one of these papers, draw the flow diagram, explain the model, conduct a unit analysis, and interpret the basic reproductive number using the techniques you have learned so far.

Sensitivity analysis is an extremely flexible technique and there is really no end to the number of ways that you can complicate it. For example, in a simple variation, you could study the sensitivity of the equilibria to the parameter values, or the sensitivity of the equilibria to R 0 . You can also use more complex methods and numerical simulations to look at sensitivity of a compartment to a particular parameter as a function of time. Examples of student projects using this type of forward sensitive analysis include a 2018 study on the menstrual cycle [37] and a 2018 study of MRSA [12] . Using this approach, George et al. [37] was able to identify specific days in the menstrual cycle that intervention would be most effective.

All of these approaches go beyond studying the basic reproductive number and are outside of the scope of this paper. For details on these and other ways to use sensitivity analysis, we really recommend that you read an in-depth paper on the subject [3] and explore some of the many student projects that use sensitivity analysis in ways not described here [51] .

In this paper, we have shown you an overview of the most popular techniques used at MTBI, as well as examples of variety of topics that you can investigate with these techniques. All together, this article forms a sort of "research project in a box." It is up to you to unpack the contents of the box, consult your research mentor, and add your own personal interests to generate a unique research project.

This approach does some with some warnings: While reproductive number analysis is the most commonly used approach at MTBI, it is not suitable for every research project. This is an extremely flexible approach than can be used to investigate a lot of projects, not every problem is well suited to an ODE contagion model. Sometimes a different modeling approach, such as difference equations [15] , Markov chains [53] , cellular automata [43] , agent based modeling [28] , a probability model [50] , or a statistical approach [46] is better suited to a particular research project.

Even within the realm of differential equations, not every question is a question that can be answered by reproductive analysis. For example: sometimes a problem is an equilibrium problem [49] , or a cost problem [1] , or an optimal control problem [11] . The reproductive number can be useful in investigating these types of problems, but it does not provide the whole story.

What is most important is that you choose a research project that you are passionate about. Our best projects come from students who have chosen a topic personally meaningful to them. Many of our student projects are driven by students who have suffered from the very diseases or social problems they choose to study. This passion gives students the drive to really dig deep into the topics that they choose to study. It improves their understanding of the problem, the accuracy of their model, the effort they put into their mathematics, and the quality of their interpretations and recommendations. So our advice for building a project is as follows: (1) 

In developing a project, we prefer to that you choose a topic that has a personal meaning to you, as passion tends to make the best projects. But a research problem should also be of interest to others. Generating research questions/projects often emerges from reading what was said or left untouched in articles at the interface of the biological, computational, mathematical, and social sciences. Here are some directions and possibilities:

Research Project 1 The official confirmation of an outbreak of Ebola hemorrhagic fever in West Africa took place in 2014. Efforts to assess its impact and what could be learned from it were launched [19] . Models made use of earlier estimates of the basic reproductive number [26] with modeling results providing increased understanding of Ebola dynamics and helping assess the impact of various control efforts [17] . Soon after an effective vaccine was discovered and tested. Yet, as the 2018 Ebola epidemic in the Republic of Congo evolved, we began to experience one of the worst outbreaks. In short, fighting Ebola, vaccine in hand, did not prevent its devastating impact on the affected populations, Why?

Research Project 2 What is the impact of cross-immunity on influenza strain dynamics and how does its role compare to the effect of partially effective vaccines? What if we had a partially effective universal flu vaccine? What percentage of the population should be vaccinated to ameliorate the impact of an outbreak?

One of the most challenging family of viruses is that associated with influenza A. Currently, there are three subtypes of influenza A, with variants of each subtype, called strains, being continuously generated across the world and transported primarily by the mobility patterns of billions of individuals. Influenza involves what it is known as seasonal and pandemic variants. The kind spread and its impact depends on the variation between strains. The 1918 pandemic was devastating and it has been linked to at least 50 million deaths. Vaccines, typically effective against seasonal influenza, are in general ineffective against novel strains, the most likely of drivers of pandemic influenza. As an outbreak sweeps a region, it alters the average immunological profile of a population. By how much? Well it depends on levels of protection, cross-immunity, that may protect against future strains. Some relevant references include [8, 10, 52] .

The HIV epidemic, the re-emerging tuberculosis (TB) epidemic and their synergistic impact on each other, including the growth of antibiotic resistant TB as well as the need to reduce the evolution of resistance on HIV treatment are ongoing challenges. Studying the modeling history of these diseases brings tremendous insights and would help those willing to go through the mathematics, and offers the opportunity to learn new methods as well. We recommend the following references [16, 20, 22] .

A most important challenge facing the study of epidemics come from key questions at the intersection of three fields, demography, epidemiology, and genetics. These problems are particularly challenging because they involve multiple temporal scales: the epidemiological, the demographic, and the evolutionary time scales. What models and approaches have we developed to address the joint dynamics of these three processes? A good place to start may be [36] .

Moving from the field of epidemiology but still looking at problems that involve some form of contagion can also be addressed in frameworks built to address particular data patterns. We have work, for example, on the spread of scientific ideas and on whether or not there is a copycat effect on that temporal patterns generated by school mass shootings [6, 59] . In general, social science provides many opportunities to innovate with epidemiological techniques. Many of our students have had success with projects in this area.

Guiding a student-led research project is a very different experience from assigning a research project. Assigning a research project allows the mentor to anticipate and plan for any difficulties that might arise, but it comes at the cost of student passion. A student-led research project is a passionate research project, but it will often be in an area outside your area of expertise. The mentor of a student-led research project is always playing catch-up. She (or he) relies on her experience and expertise to think faster than the students and anticipate problems that might arise while the project is being developed.

The primary role of the mentor of a student-led project is to serve as an academic advisor in almost the same way that you would mentor a graduate thesis. The mentor pushes to the student farther, provides instruction when students suffer from a skill gap, and generally encourage students to work to find their own answers instead of providing answers.

Part your role as a mentor will be to push the students to really understand the context. Make sure students do research into the subject and look at previous models. Use your knowledge of the literature and your academic connections to suggest places that students might research.

One of the most difficult things for students to do is to develop a research question. Help the students make sure their research question is clearly defined, academically interesting, and small enough that students can get some results within the time limit of their project. In general, the best research questions tend to be about choosing between mechanisms or strategies. Make sure that students avoid yes/no questions or questions where the answer is obvious.

For example, the Herpes group in 2018 [1] went through a number of research questions including "What is the effect of early treatment on HSV-2?" This is not a good research question because the answer "HSV-2 infections decrease" is obvious. The group further refined their research question to being about the cost of treatment, then about the relative cost of two different treatment strategies, and finally about finding the most cost-effective combination of both strategies.

In developing a research question and a corresponding model, it is important that mentors help students maintain a delicate applied mathematics balance. Students are often driven by a mathematical question or a biological question and tend to forget the other half of the model. When a model or question that is too mathematical, students tend to only pay lip-service to the biology. The biology becomes inaccurate, or the interpretation of the results becomes uninteresting. When a model or question is too biological, typically the students want to include every possible detail, the problem becomes too complex, and mathematical results become impossible within the time limit of the project. For more information on the delicate balance between biology and mathematics, the different ways in which researchers from different fields value models, and how these differences affect students selecting a research question, we highly recommend reading Smith et al. [58] .

Similarly, do not let students get away with stating bare mathematical results. Students have tendency to assume that bare mathematical results speak for themselves. But students are supposed to provide topic level expertise. Force discussion, interpretation, and recommendations. There is a reason why we have included such a long section on the interpretation of R 0 . Students should not anticipate that the readers of their papers will do the work of converting results into academic significance.

A major part of the role of a mentor is to make sure that mathematical models do not get too big. Use your modeling experience to anticipate the complexity of the model students propose. Count compartments and non-linearities to anticipate the complexity of the algebra. In many ways, Sect. 4 on interpreting the basic reproductive number is more useful for mentors than students. Mentors can use this approach backwards to anticipate the form of R 0 and how complex it will be. For example, if students want to construct a model has a generational effect, a loop-back latency period, and co-infection, squash that immediately. The resulting R 0 would be a total mess.

Much of the role of mentor is to provide mathematical expertise. Students will often come up with interesting research questions or models where the necessary analysis does not match the mathematical techniques they have learned in class. Much of mentoring is providing just-in-time tutoring in mathematical techniques that students either do not know about or have not learned very well. Occasionally, you will be missing tools as well, and a problem will require an approach or a technique that you are not familiar with. When you encounter this scenario of missing tools, learn the tools. One of the benefits of expertise and experience is that you learn mathematics much faster and much better than a student. Use this ability to learn to tutor your students in the missing tools that you both need.

Lastly, the most important role of a mentor is to inflame the passions that students have for a topic. Students work much harder when they are passionate about a topic. They innovate in ways we would not think to and study ideas that we would not normally study. They push us to innovate and improve our own practice of research. This is the primary reason why MTBI has used student-led projects since 1997.

Motivating students can be a tricky and often frustrating business. We have found that having a social agenda greatly improves the interest of both students and funding agencies in our program [23] . MTBI is a program driven by social impact. The entire program is intentionally designed to improve the representation of minorities in the sciences by creating a pipeline to graduate school. This intentional design is not hidden from students. We share our social and political passions with students, and this in turn encourages them to explore social and biological problems that they are passionate about. They see that mathematics can have a direct impact on the world and are fueled by their own passions to make the world a better place.

In the past, we have met with some resistance to this idea. We have encountered faculty who have strongly believed that student researchers and funding agencies should be motivated primarily by the intellectual merit of a research project without emphasizing broader impacts. We understand and share the frustrations of many of these researchers. It is a difficult thing to feel as if the research we are passionate about is not valued by others on its own merits. But the reality is the broader impacts will always be an important motivator for students and a determinant in attracting diverse populations to the mathematical sciences. And while we can say that intellectual merit is definitely a necessary motivator of students [44] , the appeal of making a change in society does not take away from that motivation, it only adds to it.

A Cost-Effective Analysis of Treatment Strategies for the Control of HSV-2 Infection in the U.S.: A Mathematical Modeling -Based Case Study

Prisoner reform programs, and their impact on recidivism

Sensitivity analysis for uncertainty quantification in mathematical models

A mathematical model of HIV and malaria co-infection in sub-Saharan Africa

A mathematical model of HIV and malaria co-infection in sub-Saharan Africa

The power of a good idea: Quantitative modeling of the spread of ideas from epidemiological models

Mathematics for the life sciences

Mathematical models for communicable diseases

Mathematical Models in Population Biology and Epidemiology, second edn

Mathematical Models in Epidemiology

A mathematical model of the emission and optimal control of photochemical smog

Comparison of screening for methicillin-resistant Staphylococcus aureus (MRSA) at hospital admission and discharge

An epidemiological approach to the spread of political third parties

The mathematical and theoretical biology institute -a model of mentorship through research

Modeling the dynamics and control of Lyme disease in a tickmouse system subject to vaccination of mice populations

Mathematical and statistical approaches

Modeling ebola at the mathematical and theoretical biology institute (MTBI)

Promoting research and minority participation via undergraduate research in the mathematical sciences. MTBI/SUMS-Arizona State University

Beyond ebola: Lessons to mitigate future pandemics

On the computation of Ro and its role on. Mathematical approaches for emerging and reemerging infectious diseases: an introduction

Student-driven research at the mathematical and theoretical biology institute

Dynamical models of tuberculosis and their applications

Why REUs matter

A preliminary theoretical analysis of an REU's community model

Immigration laws and immigrant health: Modeling the spread of tuberculosis in Arizona

The basic reproductive number of ebola and the effects of public health measures: the cases of Congo and Uganda

The cursed duet: Dynamics of HIV-TB co-infection in South Africa

The recovery and ecological succession of the tropical Montserrat flora from periodic volcanic eruptions

Who says we r0 ready for change?

Community resilience in collaborative learning

Mathematical modeling of the sex worker industry as a supply and demand system

The effects of student-teacher ratio and interactions on student/teacher performance in high school scenarios

On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations

Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission

Mathematical Models in Biology, first edn

The influence of infectious diseases on population genetics

The effect of gonadotropin-releasing hormone (GnRH) on the regulation of hormones in the menstrual cycle: a mathematical model

The role of vaccination in the control of SARS

The role of vaccination in the control of SARS

Am i too fat? bulimia as an epidemic

Am i too fat? bulimia as an epidemic

The dynamics of math anxiety as it is transferred through peer and teacher interactions

Spatial dynamics of myeloid-tumor cell interactions during early non-small adenocarcinoma development

Intellectual need

The Effect of Localized Oil Spills on the Atlantic Loggerhead Turtle Population Dynamics

Mean time to extinction of source-sink metapopulation for different spatial considerations

The role of transactional sex in spreading HIV/Aids in Nigeria: A modeling perspective

The role of transactional sex in spreading HIV/Aids in Nigeria: A modeling perspective

Economics of Prison: Modeling the Dynamics of Recidivism

Mathematical model for time to neuronal apoptosis due to accrual of DNA DSBs

On the role of cross-immunity and vaccines on the survival of less fit flu-strains

Critical response models for foot-and-mouth disease epidemics

A Biologist's Guide to Mathematical Modeling in Ecology and Evolution

A mathematical model of coral reef response to destructive fishing practices with predator-prey interactions

A Stage Structured Model of the Impact of Buffelgrass on Saguaro Cacti and their Nurse Trees

An epidemiological approach to the spread of political third parties. arXiv

Seeking diversity in mathematics education: mathematical modeling in the practice of biologists and mathematicians

Contagion in mass killings and school shootings

Acknowledgements Special thanks to MTBI mentors Leon Arriola, Christopher Kribs, Anuj Mubayi, Karen Rios-Soto, and Baojun Song, whose contributed expertise in mentoring students formed the foundation for this paper. Student research that formed the basis for this paper was supported by NSF grants