To the Cloud! A Grassroots Proposal to Accelerate Brain Science Discovery


Neuron

NeuroView
To the Cloud! A Grassroots Proposal
to Accelerate Brain Science Discovery
Neuro Cloud Consortium*
*Correspondence: jovo@jhu.edu (Joshua T. Vogelstein)
http://dx.doi.org/10.1016/j.neuron.2016.10.033

The revolution in neuroscientific data acquisition is creating an analysis challenge. We propose leveraging
cloud-computing technologies to enable large-scale neurodata storing, exploring, analyzing, and modeling.
This utility will empower scientists globally to generate and test theories of brain function and dysfunction.
Introduction
Technological advances from all around

the globe (Grillner et al., 2016) are allowing

neuroscientists to collect more precise,

complex, varied, and extensive data than

ever before (Sejnowski et al., 2014). How

can we maximally accelerate our collective

ability to extract meaning from such

data? To answer this question, the United

States Congress commissioned the

National Science Foundation (NSF) to

‘‘convene government representatives,

neuroscience researchers, private entities,

and non-profit institutions’’ (https://www.

congress.gov/congressional-report/113th-

congress/house-report/448). The NSF

funded two events. The first was a work-

shop of over 75 individuals from 12 coun-

tries and 5 continents that was broadcast

live over the internet. Each person was

invited to bring a single big idea—one

that could have maximal impact, while be-

ing both feasible, given existing resources,

and universally inclusive. Four ideas

emerged as grand challenges for global

brain science (Vogelstein et al., 2016). A

second event was organized to discuss

these ideas with a larger (425 participants)

and more diverse community,which willbe

the subject of another article. The goal of

this NeuroView is to describe one of the

four grand challenges and propose a strat-

egy to overcome it, in order to gather feed-

back from the larger community. The

authors are participants in the first confer-

ence who volunteered to hash out these

ideas via emails, online documents, con-

ference calls, and in-person visits.

The kernel of the idea is based on a view

ofthescientificprocessasan‘‘upwardspi-

ral’’: a collective effort where each new

experiment yields data, upon which anal-

ysis is performed, leading to new or refined

models, which suggest novel experiments
622 Neuron 92, November 2, 2016 ª 2016 El
(see Figure 1). Historically, the process of

data analysis has been kept relatively sim-

ple by the small scale of data acquired. But

recent advances in experimental technol-

ogy, such as serial electron microscopy

(Denk and Horstmann, 2004), light sheet

microscopy (Weber et al., 2014), and

models of the whole human brain at the

microscopic level (Amunts et al., 2013),

have made data analysis significantly

more challenging. While experimental

neuroscience is enabling the collection of

ever larger and more varied datasets,

information technology is undergoing a

revolutionofitsown. Commercialdevelop-

ment of artificial intelligence and cloud

computing innovations are changing the

computational landscape (The Economist,

2016). Computing is moving toward ‘‘clou-

dification,’’ a ‘‘software as a service’’

model, in which locally installed software

programs are replaced by web apps.

These forces create a massive opportunity

to develop new computational technolo-

gies that complement advances in data

collection in order to accelerate and

democratize model building, hypothesis

testing, and model refinement.

What Would Change If We
Capitalize on This Opportunity?
Consider sending a letter, watching a

movie at home, or obtaining reference in-

formation. Ten to twenty years ago, to

send a letter, we purchased paper,

stamps, and envelopes; to watch a movie

at home, we rented or purchased a VHS or

DVD; to obtain reference information, we

bought an encyclopedia and obtained

yearly revisions. Today, each of those op-

tions is still available and indeed preferred

in certain circumstances. However, web

options exist for each activity as well. In

each case, we have privacy, bandwidth,
sevier Inc.
and financial concerns. Nonetheless, for

many of our daily practices we use these

cyber solutions, sometimes putting our

most private information in the cloud.

The everyday practice of brain science is

just beginning to benefit from similar tech-

nology development.

Other scientific disciplines have already

navigated similar waters with remarkable

success. For example, the Sloan Digital

Sky Survey (SDSS) changed the daily

practice of astronomers and cosmologists

(Kent, 1994). They still have the option to

wait 6 months for telescope time, analyze

their data locally on machines they own

and maintain, and publish a summary of

the results (and many do). Yet there are

moreaccountsinSDSSthan therearepro-

fessional cosmologists. Astronomers can

now log in to SDSS, find previously pub-

lished data, run database queries (a skill

they typically did not have prior to SDSS),

and publish the queries and results. Simi-

larly, molecular geneticists historically

sequenced their own data (using ma-

chines that they owned and maintained),

analyzed it locally, and published the re-

sults. Now, they can outsource the se-

quencing to avoid owning and maintaining

the machines, upload the sequences to a

national or international database, quanti-

tatively compare their sequences to previ-

ously published sequences, and then pub-

lish their findings. The success of these

efforts is evident from the cultural shift of

dailypracticesbymany,ifnotmost,partic-

ipants in each field. Both fields resolved

issues of data privacy, data ownership,

governance, and financial concerns,

providing aproofofprinciplethatothersci-

entific disciplines can do the same.

In neuroscience, many of our scien-

tific practices remain based on pre-

internet methods. A scientist designs an

mailto:jovo@jhu.edu
http://dx.doi.org/10.1016/j.neuron.2016.10.033
https://www.congress.gov/congressional-report/113th-congress/house-report/448
https://www.congress.gov/congressional-report/113th-congress/house-report/448
https://www.congress.gov/congressional-report/113th-congress/house-report/448
http://crossmark.crossref.org/dialog/?doi=10.1016/j.neuron.2016.10.033&domain=pdf


Figure 1. The Upward Spiral of Science

Neuron

NeuroView
experiment, collects data,

stores it locally, keeps meta-

data in his head or in some

customspreadsheet,analyzes

it using software that he buys

and installs on local com-

puters that he updates regu-

larly,andpublishesasummary

of the results. We predict that

another strategy will be supe-

rior for many situations: as

the scientist collects data, it

gets stored privately or pub-

licly in the cloud, and she

then selects analyses to occur

automatically, having the flexi-

bility to pull from a variety of

previouslypublishedanalyses,

and finally publishes entire
‘‘digital experiments,’’ containing (some

of) the data and the entire analysis pipeline.

What Are the Primary Goals?
We see two key goals that, if achieved,

would leverage advances in computing

to accelerate brain sciences. The first

goal is to make reproducibility and exten-

sibility of science as easy as possible,

even for small amounts of data or simple

data. The current practices of private

data storage and siloed analyses make

reproducing an analytic result tedious at

best and impossible at worst. The steps

can include requesting the data, identi-

fying the formats and organization, re-

questing the code, deciding which func-

tions to run and how, getting all

necessary dependencies installed, mak-

ing sure to use the same software ver-

sions, and accessing the same computa-

tional hardware. Solutions now exist to

mitigate each of these challenges, though

they are relatively disparate and uncon-

nected. Data can be uploaded to data re-

positories (e.g., https://figshare.com/),

data standards have been proposed for

several domains of brain science (e.g.,

http://bids.neuroimaging.io/ and http://

www.nwb.org/), code can be stored in

publicly accessible repositories (e.g.,

https://github.com/), interactive tutorials

can be provided (e.g., using http://

jupyter.org/), and all necessary software

dependencies can be easily packaged

together (e.g., using https://www.

docker.com/) and run ‘‘in the cloud’’

(e.g., using http://mybinder.org/) on com-

mercial service providers (e.g., on https://
aws.amazon.com/ec2/ or https://cloud.

google.com/). Nonetheless, given some

new data, it is not obvious where to find

reference algorithms or how to connect

them to the data. Similarly, given a new

model, it is not clear how to find reference

data, figure out which standard it is using

and then fit it, and determine if others

have done the same to allow us to

compare and assess the results. In either

case, once the data are processed, it re-

mains difficult to keep track of the result-

ing data derivatives and which version of

which code resulted in which outputs.

So although many of the pieces are in

place, there is still no unified ‘‘glue’’ that

makes everything work together seam-

lessly. Moreover, each of the above-

mentioned tools can be used by some

brain scientists, but most tools are de-

signed for data scientists, so the learning

curve can be incredibly steep. Ideally,

there would be a place where brain scien-

tists could find all relevant analyses and

data, run each analysis on each dataset,

and see a leaderboard comparing perfor-

mances, without writing any lines of code.

Cloud-based solutions simplify reproduc-

ibility and extensibility by essentially elim-

inating activation energy and extraneous

sources of analytic variability.

The second goal is to enable such a sys-

tem to work with ‘‘big data’’ (i.e., data too

large to fit on a workstation). Data are

scaling in many domains in brain science,

either because individual experiments are

large (as in calcium imaging and whole-

brain CLARITY imaging), there are thou-

sands of subjects with gigabytes of data
Neu
each (as in large-scale human

brain imaging projects), or

there are millions of time

points (as in wearable sensor

data). Regardless of source

and modality, if it is ‘‘medium

data’’ (meaning too large to

fit in memory, but small

enough to fit on your com-

puter), tasksas simple asvisu-

alizing, rotating, and opening

the data are challenging using

standard tools such as

MATLAB, Python, or ImageJ.

For big data, the challenges

are even larger because

questions of how to store,

compress, manage, and

archive the data exceed the
computational capabilities and resources

of most experimental labs. Cloud-based

solutions simplify big data analysis due to

their inherently scalable nature.

What’s the Big Idea?
We are proposing to design, build, and

deploy an instance of ‘‘cloud neurosci-

ence,’’ meaning that the data, the code,

and the analytic results all live in the cloud

together. Cloud neuroscience can be

thought of as an operating system, a set

of programs that run on it, a file system

that stores the data, and the data itself,

all designed to run in a scalable fashion

and to be accessible from anywhere.

What Are the Design Criteria?
First and foremost, the design and con-

struction should be organic, grassroots,

and open source, to ensure that it remains

intimately connected to the needs of all

scientific citizens. Over 100,000 people

attend annual brain science conferences,

including neuroscience, psychology, psy-

chiatry, and neurology. This is a massive

human capital resource, so the system

should enable contributions from any of

them, regardless of background or re-

sources. Thus, the system needs to sup-

port data and workflows of all kinds,

regardless of modality, complexity, or

scale—including raw data, derived data,

and metadata. Doing so would also further

democratize brain sciences, opening the

door to the additional 3.5 billion people

with mobile broadband access who

could contribute if given the opportu-

nity. Encouraging and supporting such
ron 92, November 2, 2016 623

https://figshare.com/
http://bids.neuroimaging.io/
http://www.nwb.org/
http://www.nwb.org/
https://github.com/
http://jupyter.org/
http://jupyter.org/
https://www.docker.com/
https://www.docker.com/
http://mybinder.org/
https://aws.amazon.com/ec2/
https://aws.amazon.com/ec2/
https://cloud.google.com/
https://cloud.google.com/


Figure 2. Schematic of the Five Proposed Components
An individual can adopt any or all of the five roles (color-coded dashed rectangles). For each component, the cloud content is generated by individuals in one of
the five roles.

Neuron

NeuroView
involvement motivates an emphasis on

ethical standards and cultural sensitivities.

Moreover, millions of hours and billions of

dollars have been spent developing brain

science resources, including vast quanti-

ties of data, algorithms, and models. The

system should build upon such work.

Because different people have different

preferences, access controls should be

flexible enough to satisfy everyone’s

needs. For resources that are open, repro-

ducing andextending priorwork should be

‘‘turn-key,’’ allowing researchers to ‘‘swap

in’’ different datasets or algorithms as

desired. Industry is making tremendous

headway in this regard, including digital

notebooks to keep track of all analyses,

software containers to ease the burden

of installing and configuring software,

and web servicesthat dynamically provide

computational resources as needed. To

the extent possible, we should leverage

these resources and engage with non-

profit, institutional, and corporate partners

to express our domain-specific needs.

The design should be highly adaptive, to

capitalize on rapid advances from within

and outside brain sciences, and, of

course, open source with permissive li-

censes. And the entire system should be

able to run not just in a single commercial

cloud, but also on other clouds, national

resources, institutional clusters, local

workstations, and laptops, to enable

maximal portability and utility. Perhaps

most importantly, the system should be

universally useful, helping to answer the

grand challenges of brain science while

facilitating much greater participation in

the scientific process.
624 Neuron 92, November 2, 2016
The motivation underlying this en-

deavor is to accelerate the scientific pro-

cess by improving the experience of

doing brain science. Thus, the community

can determine the worst pain points in our

process and design solutions around

them. For example, if looking at data

is the largest bottleneck, then one

could use a cloud-based visualization

app (like Google Maps, CATMAID, or

NeuroDataViz). On the other hand, if the

largest bottleneck is getting data into a

common format before running analyses,

then one would benefit from having all the

data stored in a format with a standard-

ized application programming interface

(API) so every dataset can be accessed

in the same way. In other words, it is

time for the scientific community to prior-

itize the user experience to focus the sub-

sequent software development.

How Might We Achieve It?
In this section, we propose a potential

design of the constituent components

that could comprise an instance of cloud

neuroscience (see Figure 2). The required

elements can be divided into five cate-

gories: data, infrastructure, apps, algo-

rithms, and education. The goal of

breaking down the problem this way is to

ensurethatallbrainscientists,professional

and citizen alike, can contribute to and

benefitfromthesystem.Crucialtosuccess

will be tight integration across compo-

nents, each of which is described in some

detail below. Some brain scientists are

able to span the full range from design to

analysis, including running experiments,

analyzing data, making discoveries, and
even writing articles. Such polymaths can

seamlessly alternate between different

roles. Others might be highly skilled in soft-

ware engineering, but not data collection.

To ensure that all brain scientists can

contribute to this effort, we have organized

typesofactivities according tothe‘‘role’’ of

the individual performing those activities.

These roles are not meant to be prescrip-

tive; rather, they serve to help guide scien-

tists to the different kinds of contributions

they could make (see Box 1 for detailed

description of the roles).

Data
The data component is intended to miti-

gate difficulties with storing and accessing

data, regardless of the modality, scale, or

complexity of the data. Anybody would

be able to upload raw data, derived data,

and metadata as they flow off the sensors

and dynamically control access. Func-

tionality would build on and incorporate

existing brain science data repositories

(Ascoli et al., 2007; Burns et al., 2013;

Crawford et al., 2016; Poldrack et al.,

2013; Teeters et al., 2008), as well as

more general services (e.g., FigShare).

Therefore, the technical challenges for

small and large data storage and access,

for the most part, already have reasonable

solutions for many data types. The re-

maining challenges are to further lower

the barrier to entry, making data upload

and access easier, especially for multi-

terabyte datasets. Data contributions will

be able to come from anyone and could

be stored in a variety of accessible places

to minimize transfer cost and time. Access

controls would enable scalable sharing


Box 1. Roles

We enumerate six different roles for participants. Note that these are not characterizing individuals but roles that any individual can

play. Roles differ in their degree of interest and expertise in various aspects of the scientific process, all of which are important.

d Experimentalist: A person in this role is acquiring data. This includes activities such as recruiting subjects and specifying in-

clusion guidelines (for human studies), experimental setup, subject care, and data acquisition, as well as some aspects of

data management and quality control. In this role, a person has extensive knowledge of the experiment details, though

computational acumen can be quite modest.

d Architect: A person in this role is developing the infrastructure component. In this role, professional software engineer skills

are required. Architects work collaboratively on open-source repositories, possibly co-localized.

d App Engineer: A person in this role is writing apps. These apps might wrap algorithms written by the engineer or others. In this

role, best practices of software development for science, including proper scientific documentation, are crucial.

d Data Scientist: A person in this role is writing and running algorithms. These algorithms might serve any step of the scientific

process. Data scientists have a wide variety of computational backgrounds, including engineering, physics, mathematics,

statistics, and computer science.

d Scientific User: A person in this role is using tools to analyze and understand the data. This can take many forms, ranging from

looking at images and figures generated directly from the data acquisition system to fitting statistical models and combining

multiple disparate datasets. In this role, computational acumen is not required. Familiarity with the data, experimental details,

etc. can vary widely.

d Educator: A person in this role is either creating or presenting educational content, including documentation, tutorials, and

massive online open courses, as well as running workshops, hackathons, and summer courses.

Neuron

NeuroView
with minimal effort. Storage costs would

be the responsibility of the data provider

if the data are private; if public, others

could financially contribute. In either

case, economies of scale would reduce

storage costs, and we would work with

commercial clouds and national infra-

structures to offset costs to the extent

possible. The data storage formats would

allow visualization and analysis at scale.

Data contribution would be desirable

and possible from any lab, regardless of

its financial resources or location. For

example, some methods are relatively

inexpensive, such as EEG, fNIRS, and

wearable technologies. Moreover, certain

important subpopulations are better rep-

resented in less wealthy countries,

enabling unique contributions from those

places. If the same measures are included

in more expensive projects, analysis

bridges could be established between

the datasets. This would enhance transla-

tional research at a global scale. These

factors would lead to important collabora-

tions in which less wealthy countries

could influence the content and useful-

ness of this effort (Neuroinformatics Col-

laboratory, 2016).

Data types would include raw, derived,

and metadata (see Box 2 for additional de-

tails). Raw data include data from any kind

of experiment, including functional, struc-

tural, omics (e.g., genetic and epigenetic),

behavioral,andmedicaldata.Everyexper-
iment will be given a unique data identifier.

Medicaldata will be given special attention

to ensure compliance with national guide-

lines for patient privacy. Each data type will

yield a wide diversity of derived data,

including summary statistics, matrices,

networks, shapes, and more. Associated

with each entry is a collection of metadata,

including a community-driven controlled

vocabulary, as well as custom ad hoc

fields. Metadata on the derived data will

include detailed provenance history. The

system would be seeded with existing

reference datasets spanning spatial, tem-

poral, and phylogenetic scales, including

data from the Human Brain Project, the

Human Connectome Project, the Allen

Institute for Brain Science’s data portal,

IARPA’s MICrONs program, and more.

Infrastructure
The infrastructure component is intended

to mitigate difficulties in finding data or

tools, linking them together, installing soft-

ware, managing computers, and repro-

ducing and extending results. When the

infrastructure is operational, much of the

scientific process can be conducted from

a tablet or smartphone, replacing the

needtobuyandmaintainhigh-powercom-

puters or keep software up to date. The

infrastructure is essentially the operating

system upon which all the services would

run, akin to NeuroDebian (Halchenko and

Hanke, 2012), but designed specifically
for the cloud. This virtual operating system

will run in the commercial cloud, on institu-

tional resources, national centers, or local

workstations, regardless of hardware

configuration (e.g., Mac, Windows, Linux,

etc.). The software could be designed

and written by a small and distributed

team of architects to facilitate design deci-

sions considering diverse use cases.

The infrastructure could be composed of

two core sub-components. First, a data

management system would store and

organize all the data. This could include

managing access, assigning digital object

identifiers (DOIs), and supporting common

data formats, and would be easily exten-

sible to new or custom formats. Data could

also be compressed with or without loss,

as desired by the contributor. Technically,

data would be stored in a set of databases

optimized for different brain science use

cases. Second, a workflow management

system would store and organize analyses,

leveraging existing web services such

as Github and continuous integration

to the extent possible. This would

enable ‘‘digital experiments,’’ including all

stages of data processing. Crucially, such

experiments could be done on different

hardware platforms, applied to different

data (by merely swapping the DOI), or use

different algorithms (a similarly simple

modification). All infrastructure services

would have easy-to-use APIs to maximize

utility and extensibility.
Neuron 92, November 2, 2016 625


Box 2. Types of Brain Science Data

d Functional data are fundamentally temporal and dynamic. Whether univariate or multivariate, the standard operations to

apply include zooming in time, subsampling, smoothing, and converting to other domains such as Fourier. Functional

data also have a spatial domain, which links them to structural data. The subdivision between functional and structural

data may be, for some data, ambiguous.

d Structural data are fundamentally spatial in nature, include 2D images, 3D volumes, and 4D and 5D hypervolumes for

multispectral and/or time-varying data (spatiotemporal data, such as fMRI and calcium imaging, are both structural and

functional). This can include structural images, as well as sparse fluorescent images, gene expression maps, etc. Standard

operations for these data include compression, downloads of volumes of arbitrary sizes and shapes, maximum projections,

averages, and more.

d Omics data are sequential and categorical, including the genome, epigenome, metabolome, and microbiome. Standard

queries for genetic data include sequence compression, alignment, and comparisons. Omics data may also have a spatial

domain (e.g., gene expression data).

d Behavioral data can be of several different types. For example, behavior can be captured via video capture (e.g., behavioral

observation of children during play), time series of task events during physiological measurements, questionnaires (e.g.,

symptom checklists), performance testing instruments (e.g., the NIH Toolbox), and other devices (e.g., actigraphy and voice

recorders). Each datum has unique qualities and, therefore, functionality.

d Medical data include all electronic health data, including semi-structured text. They are among the most challenging of data

types to aggregate, for until recently, the vast majority of the field has relied on paper charts or poorly structured electronic

health record (EHR) systems. Fortunately, regulatory and funding agencies are incentivizing the widespread use of EHRs, as

well as common data elements that are more amenable to data aggregation for the purposes of discovery science (e.g., the

eMerge Network). Additionally, informatics frameworks are being developed to safely link disparate EHR data (e.g., https://

www.i2b2.org/), and calls for the creation of open APIs are gaining attention.

Neuron

NeuroView
Apps
The apps component is intended to miti-

gate difficulties in maintaining software

versions, paying for software, and finding

tools appropriate to run on data. Apps are

the programs that run on the system, akin

to tools like Dropbox (to upload/down-

load), Google Maps (to visualize),

PubMed Central (to search for informa-

tion), BLAST (to compare your data with

other data), and pipelines (to process

your data). Apps can be developed by

anybody with minimal programming skills,

due to the careful design of the APIs in the

infrastructure. A specification would be

formalized and quality standards agreed

upon by the community of users to pub-

lish apps in the open app marketplace.

Different apps would be designed for

users with different backgrounds, roles,

and goals. For example, apps targeted

at people in the experimentalist role could

include features to enable uploading,

downloading, and managing access

without having to learn the APIs. On the

other hand, apps targeted at people in

the data analysis role could include pre-

processing data, fitting models, testing

hypotheses, plotting results, and running

digital experiments. General purpose

apps would include tools to visualize,

manipulate, and manually annotate data.
626 Neuron 92, November 2, 2016
These general purpose apps enable a

much broader community of users to

participate in the scientific process,

including those without extensive tech-

nical training or financial resources.

Algorithms
The algorithms component is intended

to mitigate difficulties in analyzing data

with increasing scale or complexity.

Recent advances in artificial intelligence,

including distributed machine learning

libraries and deep learning, could be lever-

aged here. Algorithms operate on simu-

lated, measured, or derived data to pro-

duce transformed representations or

summary statistics of the data. Algorithms

can be written by anybody with minimal

data-science skills, including many cur-

rent brain scientists, without knowledge

of this proposed system (unlike apps). Al-

gorithms are essentially ‘‘wrapped’’ in

apps to run and therefore inherit many of

the conveniences of the system. We parti-

tion algorithms into three different types.

Scalable data-processing algorithms can

be applied to a wide variety of data types.

These will be easily daisy-chained

together to obtain pipelines, which can

similarly be adapted to apply different al-

gorithms or data. Because algorithms will

be applied more generally to less familiar
data, or less familiar algorithms will be

applied to familiar data, quality assess-

ment will be particularly important. This

wouldincludebothqualitativedashboards

providing figures and quantitative metrics

to evaluate and compare performances

along different metrics. Finally, to optimize

resources and avoid duplicating efforts

across labs, experiments will need to be

useful for a large number of people. Exper-

imental design will therefore be a key algo-

rithmic component as well.

Education
Just like there is a learning curve when

switching from Windows to Mac, so too

switching from current practices to this

system will involve a learning curve. There-

fore, the success of this endeavor will

depend on extensive educational material,

including documentation, tutorials, online

courses, hackathons, workshops, and

summer courses. All the content will be

designed to complement existing educa-

tional resources, such as Coursera

courses. The variety of educational

resources would reflect the backgrounds

and skills of the user and contributor

communities, with the goal of universal ac-

cess. Because of this variety, community-

driven cultural sensitivity guidelines would

be posted for all contribution types.

https://www.i2b2.org/
https://www.i2b2.org/


Neuron

NeuroView
Discussion
Here we describe an immediately action-

able grassroots proposal to marry recent

advances in neurodata acquisition with

scalable cloud computing to accelerate

the process of discovery by scientists

independently of how well resourced

they are (we have developed a proof-of-

concept example using multimodal MRI

data; see http://neurodata.io for details).

There are several mechanisms by which

Cloud Neuroscience may yield benefits.

Global collaborations may become

much simpler and therefore more preva-

lent. Open science may be facilitated,

and the barriers and benefits to con-

ducting open science may become more

transparent by virtue of the design. Many

models can be tested on the same data-

set, and individual models can be sub-

jected to greater diversity of data-based

reality checks. In the near term, any effort

that generates reference data of interest

to a large segment of the community can

benefit from Cloud Neuroscience. One

example is the upcoming �10 petabytes
from the IARPA MICrONS program.

Several potential criticisms are worth

addressing, and many details need to be

fleshed out. Privacy concerns for human

data will require careful additional thinking

so that best practices of anonymization

and security can be implemented—prece-

dent is provided by ongoing large research

initiatives (e.g., Jack et al., 2008; Murphy

et al., 2010; Sarwate et al., 2014). A viable

financial model will be required. Potential

partners include national laboratories that

could contribute computing and storage

resources, or companies interested in

providing cloud-based web services for

specific scientific subdomains. Return on

investment must be considered. Cosmol-

ogy, molecular genetics, and plant biology

(see http://www.cyverse.org/) are existing

proofs that when designed well, such re-

sources can a yield dramatic and positive

impact on the field. Other cloud-

computing neuroscience efforts that focus

on the human brain are already underway,

such as CBRAIN (Das et al., 2016) and the

Human Brain Project. Such efforts are

important; the proposed project has

been designed to leverage the develop-

ments from those projects and extend

them to address a greater diversity of brain

science questions, species, data modal-

ities, and functionalities.
The above plans and challenges sug-

gest immediately actionable next steps.

A field engineer has been appointed to

develop asurveyto determinewhichexist-

ing resources are most useful (pooling in-

formation from places like https://github.

com/ and https://www.nitrc.org/) and

what new resources would be most useful.

A software engineer has agreed to

contribute significant effort toward build-

ing a ‘‘Neuroscience as a Service’’ frame-

work (the virtual operating system and

apps described above) based upon exist-

ing related services. They will begin

formalizing minimal specifications for all

resources. We have also obtained private

seed funding to hire an additional senior

software engineer. To gather community

feedback, we will be monitoring https://

neurostars.org/ for any posts that

contain the tag ‘‘neurostorm.’’ Next,

sustainable governance, funding, and

advisory models will be devised.

Pablo Picasso famously quipped,

‘‘Every child is an artist. The problem is

how to remain an artist once we grow

up.’’ As the next generation of brain scien-

tists grows up, we have an opportunity to

provide them with a canvas on which they

can craft ever more creative portraits of

our minds. Cloud neuroscience is one

step we can take in that direction.

SUPPLEMENTAL INFORMATION

Supplemental Information includes a complete
author list with affiliations and can be found with
this article online at http://dx.doi.org/10.1016/j.
neuron.2016.10.033.

ABOUT THE AUTHORS

Joshua T. Vogelstein is a neurostatistician; an
Assistant Professor of Biomedical Engineering at
Johns Hopkins University (JHU); and a member
of the Institute for Computational Medicine, Center
for Imaging Science, and Kavli Neuroscience Dis-
covery Institute (KNDI). Brett Mensh founded Opti-
mize Science, a science consulting agency, and is
Scientific Advisor at Janelia Research Campus.
Drs. Vogelstein and Mensh co-organized the
Global Brain Workshop, an event in April 2016
with Richard Huganir, Professor and Director of
the Department of Neuroscience and Director of
KNDI, JHU, and Michael I. Miller, Herschel and
Ruth Seder Professor and University Gilman
Scholar, Director of the Center for Imaging Sci-
ence, and Co-director of KNDI, JHU. All the co-au-
thors were invited to the Global Brain Workshop on
the basis of their international leadership spanning
different spatial, temporal, and phylogenetic
scales. They each subsequently volunteered to
continue discussing this content for the ensuing
weeks and months.
REFERENCES

Amunts, K., Lepage, C., Borgeat, L., Mohlberg, H.,
Dickscheid, T., Rousseau, M.-E., Bludau, S.,
Bazin, P.-L., Lewis, L.B., Oros-Peusquens, A.-M.,
et al. (2013). Science 340, 1472–1475.

Ascoli, G.A., Donohue, D.E., and Halavi, M. (2007).
J. Neurosci. 27, 9247–9251.

Burns, R., Roncal, W.G., Kleissas, D., Lillaney, K.,
Manavalan, P., Perlman, E., Berger, D.R., Bock,
D.D., Chung, K., Grosenick, L., et al. (2013). Sci
Stat Database Manag. http://dx.doi.org/10.1145/
2484838.2484870.

Crawford, K.L., Neu, S.C., and Toga, A.W. (2016).
Neuroimage 124 (Pt B), 1080–1083.

Das, S., Glatard, T., MacIntyre, L.C., Madjar, C.,
Rogers, C., Rousseau, M.-E., Rioux, P., MacFar-
lane, D., Mohades, Z., Gnanasekaran, R., et al.
(2016). Neuroimage 124 (Pt B), 1188–1195.

Denk, W., and Horstmann, H. (2004). PLoS Biol. 2,
e329.

Grillner, S., Ip, N., Koch, C., Koroshetz, W., Okano,
H., Polachek, M., Poo, M.-M., and Sejnowski, T.J.
(2016). Nat. Neurosci. 19, 1118–1122.

Halchenko, Y.O., and Hanke, M. (2012). Front.
Neuroinform. 6, 22.

Jack, C.R., Jr., Bernstein, M.A., Fox, N.C., Thomp-
son, P., Alexander, G., Harvey, D., Borowski, B.,
Britson, P.J., L Whitwell, J., Ward, C., et al.
(2008). J. Magn. Reson. Imaging 27, 685–691.

Kent, S.M. (1994). Science with Astronomical
Near-Infrared Sky Surveys, N. Epchtein, A. Omont,
B. Burton, and P. Persi, eds. (Springer), pp. 27–30.

Murphy, S.N., Weber, G., Mendis, M., Gainer, V.,
Chueh, H.C., Churchill, S., and Kohane, I. (2010).
J. Am. Med. Inform. Assoc. 17, 124–130.

Neuroinformatics Collaboratory (2016). Neuroinfor-
matics Collaboratory, http://www.neuroinformatics-
collaboratory.org.

Poldrack, R.A., Barch, D.M., Mitchell, J.P., Wager,
T.D., Wagner, A.D., Devlin, J.T., Cumba, C.,
Koyejo, O., and Milham, M.P. (2013). Front. Neuro-
inform. 7, 12.

Sarwate, A.D., Plis, S.M., Turner, J.A., Arbabshir-
ani, M.R., and Calhoun, V.D. (2014). Front. Neuro-
inform. 8, 35.

Sejnowski, T.J., Churchland, P.S., and Movshon,
J.A. (2014). Nat. Neurosci. 17, 1440–1441.

Teeters, J.L., Harris, K.D., Millman, K.J., Olshau-
sen, B.A., and Sommer, F.T. (2008). Neuroinfor-
matics 6, 47–55.

The Economist (2016). The future of computing.
The Economist, http://www.economist.com/news/
leaders/21694528-era-predictable-improvement-
computer-hardware-ending-what-comes-next-
future.

Vogelstein, J.T., Amunts, K., Andreou, A., Angelaki,
D., Ascoli, G., Bargmann, C., Burns, R., Cali, C.,
Chance, F., Chun, M., et al. (2016). arXiv, ar-
Xiv:1608.06548, https://arxiv.org/abs/1608.06548.

Weber, M., Mickoleit, M., and Huisken, J. (2014).
Methods Cell Biol. 123, 193–215.
Neuron 92, November 2, 2016 627

http://neurodata.io
http://www.cyverse.org/
https://github.com/
https://github.com/
https://www.nitrc.org/
https://neurostars.org/
https://neurostars.org/
http://dx.doi.org/10.1016/j.neuron.2016.10.033
http://dx.doi.org/10.1016/j.neuron.2016.10.033
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref1
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref1
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref1
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref1
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref2
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref2
http://dx.doi.org/10.1145/2484838.2484870
http://dx.doi.org/10.1145/2484838.2484870
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref4
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref4
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref5
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref5
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref5
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref5
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref6
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref6
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref7
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref7
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref7
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref8
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref8
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref9
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref9
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref9
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref9
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref10
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref10
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref10
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref11
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref11
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref11
http://www.neuroinformatics-collaboratory.org
http://www.neuroinformatics-collaboratory.org
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref13
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref13
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref13
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref13
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref14
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref14
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref14
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref15
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref15
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref16
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref16
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref16
http://www.economist.com/news/leaders/21694528-era-predictable-improvement-computer-hardware-ending-what-comes-next-future
http://www.economist.com/news/leaders/21694528-era-predictable-improvement-computer-hardware-ending-what-comes-next-future
http://www.economist.com/news/leaders/21694528-era-predictable-improvement-computer-hardware-ending-what-comes-next-future
http://www.economist.com/news/leaders/21694528-era-predictable-improvement-computer-hardware-ending-what-comes-next-future
https://arxiv.org/abs/1608.06548
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref19
http://refhub.elsevier.com/S0896-6273(16)30783-8/sref19


Neuron, Volume 92
Supplemental Information
To the Cloud! A Grassroots Proposal

to Accelerate Brain Science Discovery

Neuro Cloud Consortium


Joshua T. Vogelstein,1,33,34,35,36,* Brett Mensh,2,3,5 Michael Häusser,4 Nelson Spruston,5 Alan C. Evans,6 
Konrad Kording,7 Katrin Amunts,8,9,10 Christoph Ebell,10 Jeff Muller,10 Martin Telefont,10 Sean Hill,11 
Sandhya P. Koushika,12 Corrado Calì,13 Pedro Antonio Valdés-Sosa,14,15 Peter B. Littlewood,16 Christof 
Koch,17 Stephan Saalfeld,5 Adam Kepecs,18 Hanchuan Peng,17 Yaroslav O. Halchenko,19 Gregory Kiar,1,33 
Mu-Ming Poo,20 Jean-Baptiste Poline,21 Michael P. Milham,22,23 Alyssa Picchini Schaffer,24 Rafi Gidron,25 
Hideyuki Okano,26,27 Vince D. Calhoun,28,29 Miyoung Chun,30 Dean M. Kleissas,31 R. Jacob Vogelstein,32 Eric 
Perlman,33 Randal Burns,34,35 Richard Huganir,36,37 and Michael I. Miller1,33,37 
1Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 
Baltimore, MD 21218, USA 
2Optimize Science, Mill Valley, CA 94941, USA 
3UCSF Kavli Institute for Fundamental Neuroscience, San Francisco, CA 94143, USA 
4Wolfson Institute for Biomedical Research and Department of Neuroscience, Physiology, and Pharmacology, 
University College London, Gower Street, London WC1E 6BT, UK  
5Janelia Research Campus, Howard Hughes Medical Institute, 19700 Helix Drive, Ashburn, VA 20147, USA 
6Montreal Neurological Institute, McGill University, 3801 University Street, Montreal, QC H3A 2B4, Canada 
7Departments of Physical Medicine and Rehabilitation, Physiology, Applied Mathematics, and Biomedical 
Engineering, Northwestern University, 345 East Superior Street, Chicago, IL 60611, USA 
8Institute for Neuroscience and Medicine, INM-1, Forschungszentrum Jülich, 52428 Jülich, Germany 
9Cécile and Oskar Vogt Institute of Brain Research, University Hospital Duesseldorf, University Duesseldorf, 40225 
Düsseldorf, Germany 
10Human Brain Project, EPFL, 1202 Geneva, Switzerland 
11Blue Brain Project, EPFL, Campus Biotech, 1202 Geneva, Switzerland 
12Department of Biological Sciences, Tata Institute of Fundamental Research, Homi Bhabha Road, Navy Nagar, 
Colaba, Mumbai 400005, India 
13Biological and Environmental Science and Engineering, KAUST, Thuwal 23955-6900, Saudi Arabia 
14University of Electronic Science and Technology of China, Shahe Campus, Chengdu, Sichuan 610054, PRC 
15Cuban Neurosciences Center, Cubanacan, Playa, Havana CP 11600, Cuba 
16Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439, USA 
17Allen Institute for Brain Science, 615 Westlake Avenue North, Seattle, WA 98109, USA 
18Cold Spring Harbor Laboratory, Marks Building, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA 
19Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH 03755, USA 
20Institute of Neuroscience, Chinese Academy of Sciences Center for Excellence in Brain Science and Intelligence 
Technology, 320 Yue Yang Road, Shanghai 200031, China 
21Henry H. Wheeler Jr. Brain Imaging Center, Helen Wills Neuroscience Institute, University of California, 
Berkeley, Berkeley, CA 94720, USA 
22Center for the Developing Brain, Child Mind Institute, 445 Park Avenue, New York, NY 10022, USA 
23Nathan S. Kline Institute for Psychiatric Research, 140 Old Orangeburg Road, Orangeburg, NY 10962, USA 
24Simons Collaboration on the Global Brain, Simons Foundation, 160 Fifth Avenue, 7th Floor, New York, NY 
10010, USA 
25Israel Brain Technologies, Precede Building, Hakfar Hayarok, Ramat Hasharon 47800, Israel 
26Department of Physiology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160-8582, 
Japan 
27Laboratory for Marmoset Neural Architecture, RIKEN Brain Science Institute, 2-1 Hirosawa, Wako, Saitama 351-
0198, Japan 
28The Mind Research Network, Albuquerque, NM 87106, USA 
29Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131, USA 
30The Kavli Foundation, 1801 Solar Drive, Suite #250, Oxnard, CA 93030, USA 
31Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Road, Laurel, MD 20723, USA 
32Intelligence Advanced Research Projects Activity (IARPA), Maryland Square Research Park, 5850 University 
Research Court, Riverdale Park, MD 20737, USA 
33Center for Imaging Science 
34Department of Computer Science 
35Institute for Data Intensive Engineering and Science 

Johns Hopkins University, Baltimore, MD 21218, USA 
36Department of Neuroscience, Johns Hopkins University, Baltimore, MD 21205, USA 


37Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD 21218, USA 

*Correspondence: jovo@jhu.edu 
 

	To the Cloud! A Grassroots Proposal to Accelerate Brain Science Discovery
	Introduction
	What Would Change If We Capitalize on This Opportunity?
	What Are the Primary Goals?
	What’s the Big Idea?
	What Are the Design Criteria?
	How Might We Achieve It?
	Data
	Infrastructure
	Apps
	Algorithms
	Education
	Discussion
	Supplemental Information
	show $^ABAUTH
	References