id author title date pages extension mime words sentences flesch summary cache txt sbdevel-wordpress-com-3089 Software Development at Royal Danish Library | A peekhole into the life of the software development department at the Royal Danish Library .html text/html 11404 1017 73 Whenever the content of a field is to be used for grouping, faceting, sorting, stats or streaming in Solr (or Elasticsearch or Lucene, where applicable), it is advisable to store it using DocValues. This meant smaller storage footprint and cleaner code, but also caused the worst case single value lookup to scale linear with document count: Getting the value for a DocValued field from the last document in a segment required a visit to all other value blocks. The linear access time was not a problem for small indexes or requests for values for a lot of documents, where most blocks needs to be visited anyway. Simple term-based relevance ranked searches for the top-20 matching documents with 9 core fields: id, source_file_s, url_norm, host, domain, content_type_served, content_length, crawl_date and content_language. At netarchive search a typical request uses grouping, faceting, cardinality counting and top-20 document retrieval. ./cache/sbdevel-wordpress-com-3089.html ./txt/sbdevel-wordpress-com-3089.txt