id author title date pages extension mime words sentences flesch summary cache txt cord-151118-25cbus1m Murray, Benjamin Accessible Data Curation and Analytics for International-Scale Citizen Science Datasets 2020-11-02 .txt text/plain 4954 256 58 To test the performance of the join operator when ExeTera and Pandas are used, we generate a dataset composed of a left primary key (int64), a right foreign key (int64) and 1, 2, 4, 8, 16 , and 32 fields respectively of random numbers corresponding to entries in the right table (int32). In this work, we present ExeTera, a data curation and analytics tool designed to provide users with a low complexity solution for working on datasets approaching terabyte scale, such as national / international-scale citizen science datasets like the Covid Symptom Study. ExeTera provides features for cleaning, journaling, and generation of reproducible processing and analytics, enabling large research teams to work with consistent measures and analyses that can be reliably recreated from the base data snapshots. Although ExeTera was developed to provide data curation for researchers working on the Zoe Symptom Study, this software is being developed to be generally applicable to large-scale relational datasets for researchers who work in Python. ./cache/cord-151118-25cbus1m.txt ./txt/cord-151118-25cbus1m.txt