Software documentation largely consists of short, natural language summaries of the subroutines in the software. These summaries help programmers quickly understand what a subroutine does without having to read the source code themselves. The task of writing these descriptions is called ``source code summarization." The state-of-the-art in source code summarization is an encoder-decoder neural network with attention. Current research in this area focuses on improving these models on the encoder-side by providing better representation of source code. In this dissertation, we present a collection of methods that continue this trend by incorporating context around the source code as well as providing better generalization for these models. First, we incorporate neighboring functions in a file to better predict unique words that do not appear in the function but do appear in the file. Next, we establish action word prediction as a novel sub-problem of source code summarization. Henceforth, we demonstrate that semantic similarity based evaluation metrics are better correlated to human judgement than n-gram matching metrics. Finally, we study the effect of label smoothing as a regularization technique to allow these models to generalize better.