Managed Solr SaaS Options – Bibliographic Wilderness Skip to content Bibliographic Wilderness Menu About Contact Managed Solr SaaS Options jrochkind General January 12, 2021January 27, 2021 I was recently looking for managed Solr “software-as-a-service” (SaaS) options, and had trouble figuring out what was out there. So I figured I’d share what I learned. Even though my knowledge here is far from exhaustive, and I have only looked seriously at one of the ones I found. The only managed Solr options I found were: WebSolr; SearchStax; and OpenSolr. Of these, i think WebSolr and SearchStax are more well-known, I couldn’t find anyone with experience with OpenSolr, which perhaps is newer. Of them all, SearchStax is the only one I actually took for a test drive, so will have the most to say about. Why we were looking We run a fairly small-scale app, whose infrastructure is currently 4 self-managed AWS EC2 instances, running respectively: 1) A rails web app 2) Bg workers for the rails web app 3) Postgres, and 4) Solr. Oh yeah, there’s also a redis running one of those servers, on #3 with pg or #4 with solr, I forget. Currently we manage this all ourselves, right on the EC2. But we’re looking to move as much as we can into “managed” servers. Perhaps we’ll move to Heroku. Perhaps we’ll use hatchbox. Or if we do stay on AWS resources we manage directly, we’d look at things like using an AWS RDS Postgres instead of installing it on an EC2 ourselves, an AWS ElastiCache for Redis, maybe look into Elastic Beanstalk, etc. But no matter what we do, we need a Solr, and we’d like to get it managed. Hatchbox has no special Solr support, AWS doesn’t have a Solr service, Heroku does have a solr add-on but you can also use any Solr with it and we’ll get to that later. Our current Solr use is pretty small scale. We don’t run “SolrCloud mode“, just legacy ordinary Solr. We only have around 10,000 documents in there (tiny for Solr), our index size is only 70MB. Our traffic is pretty low — when I tried to figure out how low, it doesn’t seem we have sufficient logging turned on to answer that specifically but using proxy metrics to guess I’d say 20K-40K requests a day, query as well as add. This is a pretty small Solr installation, although it is used centrally for the primary functions of the (fairly low-traffic) app. It currently runs on an EC2 t3a.small, which is a “burstable” EC2 type with only 2G of RAM. It does have two vCPUs (that is one core with ‘hyperthreading’). The t3a.small EC2 instance only costs $14/month on-demand price! We know we’ll be paying more for managed Solr, but we want to do get out of the business of managing servers — we no longer really have the staff for it. WebSolr (didn’t actually try out) WebSolr is the only managed Solr currently listed as a Heroku add-on. It is also available as a managed Solr independent of heroku. The pricing in the heroku plans vs the independent plans seems about the same. As a heroku add-on there is a $20 “staging” plan that doesn’t exist in the independent plans. (Unlike some other heroku add-ons, no time-limited free plan is available for WebSolr). But once we go up from there, the plans seem to line up. Starting at: $59/month for: 1 million document limit 40K requests/day 1 index 954MB storage 5 concurrent requests limit (this limit is not mentioned on the independent pricing page?) Next level up is $189/month for: 5 million document limit 150K requests/day 4.6GB storage 10 concurrent request limit (again concurrent request limits aren’t mentioned on independent pricing page) As you can see, WebSolr has their plans metered by usage. $59/month is around the price range we were hoping for (we’ll need two, one for staging one for production). Our small solr is well under 1 million documents and ~1GB storage, and we do only use one index at present. However, the 40K requests/day limit I’m not sure about, even if we fit under it, we might be pushing up against it. And the “concurrent request” limit simply isn’t one I’m even used to thinking about. On a self-managed Solr it hasn’t really come up. What does “concurrent” mean exactly in this case, how is it measured? With 10 puma web workers and sometimes a possibly multi-threaded batch index going on, could we exceed a limit of 4? Seems plausible. What happens when they are exceeded? Your Solr request results in an HTTP 429 error! Do I need to now write the app to rescue those gracefully, or use connection pooling to try to avoid them, or something? Having to rewrite the way our app functions for a particular managed solr is the last thing we want to do. (Although it’s not entirely clear if those connection limits exist on the non-heroku-plugin plans, I suspect they do?). And in general, I’m not thrilled with the way the pricing works here, and the price points. I am positive for a lot of (eg) heroku customers an additional $189*2=$378/month is peanuts not even worth accounting for, but for us, a small non-profit whose app’s traffic does not scale with revenue, that starts to be real money. It is not clear to me if WebSolr installations (at “standard” plans) are set up in “SolrCloud mode” or not; I’m not sure what API’s exist for uploading your custom schema.xml (which we’d need to do), or if they expect you to do this only manually through a web UI (that would not be good); I’m not sure if you can upload custom solrconfig.xml settings (this may be running on a shared solr instance with standard solrconfig.xml?). Basically, all of this made WebSolr not the first one we looked at. Does it matter if we’re on heroku using a managed Solr that’s not a Heroku plugin? I don’t think so. In some cases, you can get a better price from a Heroku plug-in than you could get from that same vendor not on heroku or other competitors. But that doesn’t seem to be the case here, and other that that does it matter? Well, all heroku plug-ins are required to bill you by-the-minute, which is nice but not really crucial, other forms of billing could also be okay at the right price. With a heroku add-on, your billing is combined into one heroku invoice, no need to give a credit card to anyone else, and it can be tracked using heroku tools. Which is certainly convenient and a plus, but not essential if the best tool for the job is not a heroku add-on. And as a heroku add-on, WebSolr provides a WEBSOLR_URL heroku config/env variable automatically to code running on heroku. OK, that’s kind of nice, but it’s not a big deal to set a SOLR_URL heroku config manually referencing the appropriate address. I suppose as a heroku add-on, WebSolr also takes care of securing and authenticating connections between the heroku dynos and the solr, so we need to make sure we have a reasonable way to do this from any alternative. SearchStax (did take it for a spin) SearchStax’s pricing tiers are not based on metering usage. There are no limits based on requests/day or concurrent connections. SearchStax runs on dedicated-to-you individual Solr instances (I would guess running on dedicated-to-you individual (eg) EC2, but I’m not sure). Instead the pricing is based on size of host running Solr. You can choose to run on instances deployed to AWS, Google Cloud, or Azure. We’ll be sticking to AWS (the others, I think, have a slight price premium). While SearchStax gives you a pricing pages that looks like the “new-way-of-doing-things” transparent pricing, in fact there isn’t really enough info on public pages to see all the price points and understand what you’re getting, there is still a kind of “talk to a salesperson who has a price sheet” thing going on. What I think I have figured out from talking to a salesperson and support, is that the “Silver” plans (“Starting at $19 a month”, although we’ll say more about that in a bit) are basically: We give you a Solr, we don’t don’t provide any technical support for Solr. While the “Gold” plans “from $549/month” are actually about paying for Solr consultants to set up and tune your schema/index etc. That is not something we need, and $549+/month is way more than the price range we are looking for. While the SearchStax pricing/plan pages kind of imply the “Silver” plan is not suitable for production, in fact there is no real reason not to use it for production I think, and the salesperson I talked to confirmed that — just reaffirming that you were on your own managing the Solr configuration/setup. That’s fine, that’s what we want, we just don’t want to mangage the OS or set up the Solr or upgrade it etc. The Silver plans have no SLA, but as far as I can tell their uptime is just fine. The Silver plans only guarantees 72-hour support response time — but for the couple support tickets I filed asking questions while under a free 14-day trial (oh yeah that’s available), I got prompt same-day responses, and knowledgeable responses that answered my questions. So a “silver” plan is what we are interested in, but the pricing is not actually transparent. $19/month is for the smallest instance available, and IF you prepay/contract for a year. They call that small instance an NDN1 and it has 1GB of RAM and 8GB of storage. If you pay-as-you-go instead of contracting for a year, that already jumps to $40/month. (That price is available on the trial page). When you are paying-as-you-go, you are actually billed per-day, which might not be as nice as heroku’s per-minute, but it’s pretty okay, and useful if you need to bring up a temporary solr instance as part of a migration/upgrade or something like that. The next step up is an “NDN2” which has 2G of RAM and 16GB of storage, and has an ~$80/month pay-as-you-go — you can find that price if you sign-up for a free trial. The discount price price for an annual contract is a discount similar to the NDN1 50%, $40/month — that price I got only from a salesperson, I don’t know if it’s always stable. It only occurs to me now that they don’t tell you how many CPUs are available. I’m not sure if I can fit our Solr in the 1G NDN1, but I am sure I can fit it in the 2G NDN2 with some headroom, so I didn’t look at plans above that — but they are available, still under “silver”, with prices going up accordingly. All SearchStax solr instances run in “SolrCloud” mode — these NDN1 and NDN2 ones we’re looking at just run one node with one zookeeper, but still in cloud mode. There are also “silver” plans available with more than one node in a “high availability” configuration, but the prices start going up steeply, and we weren’t really interested in that. Because it’s SolrCloud mode though, you can use the standard Solr API for uploading your configuration. It’s just Solr! So no arbitrary usage limits, no features disabled. The SearchStax web console seems competently implemented; it let’s you create and delete individual Solr “deployments”, manage accounts to login to console (on “silver” plan you only get two, or can pay $10/month/account for more, nah), and set up auth for a solr deployment. They support IP-based authentication or HTTP Basic Auth to the Solr (no limit to how many Solr Basic Auth accounts you can create). HTTP Basic Auth is great for us, because trying to do IP-based from somewhere like heroku isn’t going to work. All Solrs are available over HTTPS/SSL — great! SearchStax also has their own proprietary HTTP API that lets you do most anything, including creating/destroying deployments, managing Solr basic auth users, basically everything. There is some API that duplicates the Solr Cloud API for adding configsets, I don’t think there’s a good reason to use it instead of standard SolrCloud API, although their docs try to point you to it. There’s even some kind of webhooks for alerts! (which I haven’t really explored). Basically, SearchStax just seems to be a sane and rational managed Solr option, it has all the features you’d expect/need/want for dealing with such. The prices seem reasonable-ish, generally more affordable than WebSolr, especially if you stay in “silver” and “one node”. At present, we plan to move forward with it. OpenSolr (didn’t look at it much) I have the least to say about this, have spent the least time with it, after spending time with SearchStax and seeing it met our needs. But I wanted to make sure to mention it, because it’s the only other managed Solr I am even aware of. Definitely curious to hear from any users. Here is the pricing page. The prices seem pretty decent, perhaps even cheaper than SearchStax, although it’s unclear to me what you get. Does “0 Solr Clusters” mean that it’s not SolrCloud mode? After seeing how useful SolrCloud APIs are for management (and having this confirmed by many of my peers in other libraries/museums/archives who choose to run SolrCloud), I wouldn’t want to do without it. So I guess that pushes us to “executive” tier? Which at $50/month (billed yearly!) is still just fine, around the same as SearchStax. But they do limit you to one solr index; I prefer SearchStax’s model of just giving you certain host resources and do what you want with it. It does say “shared infrastructure”. Might be worth investigating, curious to hear more from anyone who did. Now, what about ElasticSearch? We’re using Solr mostly because that’s what various collaborative and open source projects in the library/museum/archive world have been doing for years, since before ElasticSearch even existed. So there are various open source libraries and toolsets available that we’re using. But for whatever reason, there seem to be SO MANY MORE managed ElasticSearch SaaS available. At possibly much cheaper pricepoints. Is this because the ElasticSearch market is just bigger? Or is ElasticSearch easier/cheaper to run in a SaaS environment? Or what? I don’t know. But there’s the controversial AWS ElasticSearch Service; there’s the Elastic Cloud “from the creators of ElasticSearch”. On Heroku that lists one Solr add-on, there are THREE ElasticSearch add-ons listed: ElasticCloud, Bonsai ElasticSearch, and SearchBox ElasticSearch. If you just google “managed ElasticSearch” you immediately see 3 or 4 other names. I don’t know enough about ElasticSearch to evaluate them. There seem on first glance at pricing pages to be more affordable, but I may not know what I’m comparing and be looking at tiers that aren’t actually usable for anything or will have hidden fees. But I know there are definitely many more managed ElasticSearch SaaS than Solr. I think ElasticSearch probably does everything our app needs. If I were to start from scratch, I would definitely consider ElasticSearch over Solr just based on how many more SaaS options there are. While it would require some knowledge-building (I have developed a lot of knowlege of Solr and zero of ElasticSearch) and rewriting some parts of our stack, I might still consider switching to ES in the future, we don’t do anything too too complicated with Solr that would be too too hard to switch to ES, probably. Share this: Twitter Facebook Published by jrochkind View all posts by jrochkind Published January 12, 2021January 27, 2021 Post navigation Previous Post Gem authors, check your release sizes Next Post Rails auto-scaling on Heroku Leave a Reply Cancel reply Enter your comment here... Fill in your details below or click an icon to log in: Email (required) (Address never made public) Name (required) Website You are commenting using your WordPress.com account. ( Log Out /  Change ) You are commenting using your Google account. ( Log Out /  Change ) You are commenting using your Twitter account. ( Log Out /  Change ) You are commenting using your Facebook account. ( Log Out /  Change ) Cancel Connecting to %s Notify me of new comments via email. Notify me of new posts via email. Bibliographic Wilderness is a blog by Jonathan Rochkind about digital library services, ruby, and web development. Contact Search for: Email Subscription Enter your email address to subscribe to this blog and receive notifications of new posts by email. Join 215 other followers Email Address: Subscribe Recent Posts Product management February 3, 2021 Rails auto-scaling on Heroku January 27, 2021 Managed Solr SaaS Options January 12, 2021 Gem authors, check your release sizes January 11, 2021 Every time you decide to solve a problem with code… January 10, 2021 Archives Archives Select Month February 2021  (1) January 2021  (4) December 2020  (1) November 2020  (3) October 2020  (2) September 2020  (3) August 2020  (2) April 2020  (1) March 2020  (1) December 2019  (1) October 2019  (1) September 2019  (1) August 2019  (2) June 2019  (2) April 2019  (3) March 2019  (3) February 2019  (1) December 2018  (1) November 2018  (1) October 2018  (2) September 2018  (4) August 2018  (1) June 2018  (2) May 2018  (1) April 2018  (1) March 2018  (3) February 2018  (1) January 2018  (1) November 2017  (1) October 2017  (1) September 2017  (1) August 2017  (3) July 2017  (1) May 2017  (4) April 2017  (2) March 2017  (9) February 2017  (5) January 2017  (1) December 2016  (7) November 2016  (4) September 2016  (1) August 2016  (4) June 2016  (2) May 2016  (4) March 2016  (2) February 2016  (1) January 2016  (2) November 2015  (2) October 2015  (5) September 2015  (7) August 2015  (5) July 2015  (4) May 2015  (3) April 2015  (5) March 2015  (2) February 2015  (2) January 2015  (4) December 2014  (2) November 2014  (2) October 2014  (6) September 2014  (5) August 2014  (3) July 2014  (3) June 2014  (1) May 2014  (3) April 2014  (5) March 2014  (9) February 2014  (4) January 2014  (5) December 2013  (5) November 2013  (14) October 2013  (4) September 2013  (6) August 2013  (2) July 2013  (7) June 2013  (10) May 2013  (4) April 2013  (5) March 2013  (8) February 2013  (6) January 2013  (16) December 2012  (8) November 2012  (14) October 2012  (6) September 2012  (6) August 2012  (2) July 2012  (5) June 2012  (5) May 2012  (7) April 2012  (12) March 2012  (6) February 2012  (7) January 2012  (6) December 2011  (5) November 2011  (7) October 2011  (5) September 2011  (10) August 2011  (4) July 2011  (5) June 2011  (7) May 2011  (8) April 2011  (5) March 2011  (13) February 2011  (4) January 2011  (12) December 2010  (7) November 2010  (5) October 2010  (5) September 2010  (10) August 2010  (6) July 2010  (7) June 2010  (5) May 2010  (8) April 2010  (8) March 2010  (14) February 2010  (3) January 2010  (3) December 2009  (4) November 2009  (2) October 2009  (3) September 2009  (9) August 2009  (1) July 2009  (4) June 2009  (7) May 2009  (14) April 2009  (17) March 2009  (21) February 2009  (11) January 2009  (16) December 2008  (12) November 2008  (30) October 2008  (12) September 2008  (3) July 2008  (4) June 2008  (2) May 2008  (11) April 2008  (3) March 2008  (4) February 2008  (10) January 2008  (7) December 2007  (4) November 2007  (4) September 2007  (1) August 2007  (3) June 2007  (6) May 2007  (12) April 2007  (11) March 2007  (9) Feeds  RSS - Posts  RSS - Comments Recent Comments jrochkind on Rails auto-scaling on Heroku Adam (Rails Autoscale) on Rails auto-scaling on Heroku On catalogers, programmers, and user tasks – Gavia Libraria on Broad categories from class numbers Replacing MARC – Gavia Libraria on Linked Data Caution jrochkind on Deep Dive: Moving ruby projects from Travis to Github Actions for CI jrochkind on Deep Dive: Moving ruby projects from Travis to Github Actions for CI jrochkind on Deep Dive: Moving ruby projects from Travis to Github Actions for CI eregontp on Deep Dive: Moving ruby projects from Travis to Github Actions for CI Top Posts yes, product owner and technical lead need to be different people Bootstrap 3 to 4: Changes in how font size, line-height, and spacing is done. Or "what happened to $line-height-computed." Dealing with legacy and externally loaded code in webpack(er) ActiveRecord: Atomic check-and-update through optimistic locking Are you talking to Heroku redis in cleartext or SSL? Top Clicks bibwild.files.wordpress.c… apidock.com/rails/ActiveR… github.com/mperham/sideki… bibwild.files.wordpress.c… opensolr.com A blog by Jonathan Rochkind. All original content licensed CC-BY. Create a website or blog at WordPress.com Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use. To find out more, including how to control cookies, see here: Cookie Policy