Distributed spectrum sensing uses multiple sensors to measure spectrum power levels in multiple locations simultaneously, thus allowing the formation of a spectrum occupancy map at a given frequency. An accurate spectrum occupancy map can be used to inform spectrum policy, determine usage patterns and situational awareness, and evaluate the feasibility of opportunistic spectrum access methods indoors and outdoors, especially in the crowded sub-6 GHz spectrum. In general, sensors operate in a complex time-varying propagation environment, and the problem of estimating an occupancy map from a small set of sensors is ill-defined especially when the number of emitters is large or unknown. To simplify the problem, assumptions involving the number of emitters, the power of emitters, and idealized parameterizations of the propagation model are usually made. This dissertation considers using a neural network-based system to tackle the problem of estimating occupancy maps with minimal assumptions.We present a general neural network framework for computing a binary decision map over a region from a limited number of sensor measurements, and apply the framework to spectrum occupancy mapping. Given a sub-region and threshold, we wish to determine if power at a given frequency exceeds the threshold, thus determining if that frequency is ``occupied" in that sub-region. The sensors, which measure the signal power, are random in number and location. The emitting sources at the given frequency have unknown number, locations, and powers. Using an aggregation step, the variable number of measurements are transformed into log-likelihood ratios (LLRs) that are fed as a fixed-resolution image into a neural network. The system is trained to produce a map of occupancy decisions over a wide area, even where there are no sensors, and achieves excellent accuracy at determining occupancy in a variety of complex environments with an arbitrary number of emitters.In this dissertation, we examine the robustness of the developed neural network-based system to mismatch in the number of sensors, the occupancy threshold, thermal noise and multi-path fading between training and testing. We show that our system is robust, and performs similarly with or without the knowledge of such parameters during training. We introduce the concept of threshold-to-noise ratio (TNR) to characterize the accuracy in the presence of noise, and show that the accuracy of the system can be maintained at low TNR. This robustness is attributed to our input modeling and aggregation process, and allows the system to be usable in many settings without retraining.Finally, we consider a case study of simplifying the sensors to reduce the cost and power consumption for distributed spectrum mapping, while maintaining accuracy. The baseline sensor is our RadioHound sensor developed here at the University of Notre Dame, which is well-understood, and modifiable. We show that there is an advantageous trade-off of quantity over quality when the sensors are used as part of our neural network-based system; reducing quality cuts cost and power down significantly (factor of 10), but the system requires a very modest increase in the number of sensors to recover accuracy (factor of 2). Perhaps surprisingly, the effective number of bits in the sensor measurement is relatively unimportant to the accuracy of the decision maps.