Content consumption and generation is a major part of the Internet experience. Product and service-providers strive to improve user experience through personalization of services, recommendations, and understanding user interests. For this purpose, inferring user characteristics, such as demographic information, from their behavior, would help understand their preferences. Through this dissertation, we show that by using content and behavior data, we can characterize users for the purpose of improving their experience through personalization in the domains of education and online content consumption. We discuss two challenges: (1). representing users given heterogeneous, industry-scale volume of data, and (2). improving the representation of underrepresented groups of users, which is the imbalanced classification problem.