Planet Code4Lib Planet Code4Lib Planet Code4Lib - http://planet.code4lib.org CrossRef: Phil Agre and the gendered Internet There is an article today in the Washington Post about the odd disappearance of a computer science professor named Phil Agre.  The article, entitled "He predicted the dark side of the Internet 30 years ago. Why did no one listen?" reminded me of a post by Agre in 1994 after a meeting of Computer Professionals for Social Responsibility. Although it annoyed me at the time, a talk that I gave there triggered in him thoughts of gender issues;  as a women I was very much in the minority at the meeting,  but that was not the topic of my talk. But my talk also gave Agre thoughts about the missing humanity on the Web.I had a couple of primary concerns, perhaps not perfectly laid out, in my talk, "Access, not Just Wires." I was concerned about what was driving the development of the Internet and the lack of a service ethos regarding society. Access at the time was talked about in terms of routers, modems, T-1 lines. There was no thought to organizing or preserving of online information. There was no concept of "equal access". There was no thought to how we would democratize the Web such that you didn't need a degree in computer science to find what you needed. I was also very concerned about the commercialization of information. I was frustrated watching the hype as information was touted as the product of the information age. (This was before we learned that "you are the product, not the user" in this environment.) Seen from the tattered clothes and barefoot world of libraries, the money thrown at the jumble of un-curated and unorganized "information" on the web was heartbreaking. I said: "It's clear to me that the information highway isn't much about information. It's about trying to find a new basis for our economy. I'm pretty sure I'm not going to like the way information is treated in that economy. We know what kind of information sells, and what doesn't. So I see our future as being a mix of highly expensive economic reports and cheap online versions of the National Inquirer. Not a pretty picture." - kcoyle in Access, not Just Wires Little did I know how bad it would get.Like many or most people, Agre heard "libraries" and thought "female." But at least this caused him to think, earlier than many, about how our metaphors for the Internet were inherently gendered. "Discussing her speech with another CPSR activist ... later that evening, I suddenly connected several things that had been bothering me about the language and practice of the Internet. The result was a partial answer to the difficult question, in what sense is the net "gendered"?" -  Agre, TNO, October 1994This led Agre to think about how we spoke then about the Internet, which was mainly as an activity of "exploring." That metaphor is still alive with Microsoft's Internet Explorer, but was also the message behind the main Web browser software of the time, Netscape Navigator. He suddenly saw how "explore" was a highly gendered activity:"Yet for many people, "exploring" is close to defining the experience of the net. It is clearly a gendered metaphor: it has historically been a male activity, and it comes down to us saturated with a long list of meanings related to things like colonial expansion, experiences of otherness, and scientific discovery. Explorers often die, and often fail, and the ones that do neither are heroes and role models. This whole complex of meanings and feelings and strivings is going to appeal to those who have been acculturated into a particular male-marked system of meanings, and it is not going to offer a great deal of meaning to anyone who has not. The use of prestigious artifacts like computers is inevitably tied up with the construction of personal identity, and "exploration" tools offer a great deal more traction in this process to historically male cultural norms than to female ones." - Agre, TNO, October 1994He decried the lack of social relationships on the Internet, saying that although you know that other  people are there, you cannot see them. "Why does the space you "explore" in Gopher or Mosaic look empty even when it's full of other people?" - Agre, TNO, October 1994 None of us knew at the time that in the future some people would experience the Internet entirely and exclusively as full of other people in the forms of Facebook, Twitter and all of the other sites that grew out of the embryos of bulletin board systems, the Well, and AOL. We feared that the future Internet would  not have the even-handedness of libraries, but never anticipated that Russian bots and Qanon promoters would reign over what had once been a network for the exchange of scientific information. It hurts now to read through Agre's post arguing for a more library-like online information system because it is pretty clear that we blew through that possibility even before the 1994 meeting and were already taking the first steps toward to where we are today.Agre walked away from his position at UCLA in 2009 and has not resurfaced, although there have been reports at times (albeit not recently) that he is okay. Looking back, it should not surprise us that someone with so much hope for an online civil society should have become discouraged enough to leave it behind. Agre was hoping for reference services and an Internet populated with users with:"...the skills of composing clear texts, reading with an awareness of different possible interpretations, recognizing and resolving conflicts, asking for help without feeling powerless, organizing people to get things done, and embracing the diversity of the backgrounds and experiences of others." - Agre, TNO, October 1994 Oh, what world that would be! Lucidworks: How to Keep Shoppers Loyal Online-only shoppers have more options available to them than ever before. Here’s how to make your brand stand out in the crowd and keep customers coming back. The post How to Keep Shoppers Loyal appeared first on Lucidworks. David Rosenthal: The Economist On Cryptocurrencies The Economist edition dated August 7th has a leader (Unstablecoins) and two articles (in the Finance section (The disaster scenario and Here comes the sheriff).SourceThe leader argues that:Regulators must act quickly to subject stablecoins to bank-like rules for transparency, liquidity and capital. Those failing to comply should be cut off from the financial system, to stop people drifting into an unregulated crypto-ecosystem. Policymakers are right to sound the alarm, but if stablecoins continue to grow, governments will need to move faster to contain the risks. But even The Economist gets taken in by the typical cryptocurrency hype, balancing current actual risks against future possible benefits:Yet it is possible that regulated private-sector stablecoins will eventually bring benefits, such as making cross-border payments easier, or allowing self-executing “smart contracts”. Regulators should allow experiments whose goal is not merely to evade financial rules. They don't seem to understand that, just as the whole point of Uber is to evade the rules for taxis, the whole point of cryptocurrency is to "evade financial rules".Below the fold I comment on the two articles.Here comes the sheriffThis article is fairly short and mostly describes the details of Gary Gensler's statement in three "buckets":The first is about investor protection:The regulator claims jurisdiction over the crypto assets that it defines as securities; issuers of these must provide disclosures and abide by other rules. The SEC‘s definition uses a number of criteria, including the “Howey Test”, which asks whether investors have a stake in a common enterprise and are led to expect profits from the efforts of a third party. Bitcoin and ether, the two biggest cryptocurrencies, do not meet this criterion (they are commodities, under American law). But Mr Gensler thinks that ... a fair few probably count as securities—and do not follow the rules. These, he said, may include stablecoins ... some of which may represent a stake in a crypto platform. Mr Gensler asked Congress for more staff to police them. The second is about new products:For months the SEC has sat on applications for bitcoin ETFs and related products, filed by big Wall Street names like Goldman Sachs and Fidelity. Mr Gensler hinted that, in order to be approved, these may have to comply with the stricter laws governing mutual funds. The third is a request for new legal powers needed to pave over cracks in regulation that cryptocurrencies, whose whole point is to "evade financial rules", are exploiting:Mr Gensler is chiefly concerned with platforms engaged in crypto trading or lending as well as in decentralised finance (DeFi), where smart contracts replicate financial transactions without a trusted intermediary. Some of these, he said, may host tokens that should be regulated as securities; others could be riddled with scams. The SEC is likely to encounter massive opposition to these ideas. How cryptocurrency became a powerful force in Washington by Todd C. Frankel et al reports on the flow of lobbying dollars from cryptocurrency insiders to Capitol Hill and how it is blocking progress on the current infrastructure bill:And after years of debate over how to improve America’s infrastructure, and months of sensitive negotiations between the White House and lawmakers, the $1 trillion bipartisan infrastructure proposal suddenly stalled in part because of concerns about how government would regulate an industry best known for wild financial speculation, memes — and its role in ransomware attacks....Regardless of the measure’s ultimate fate, the fact that crypto regulation has become one of the biggest stumbling blocks to passage of the bill underscored how the industry has become a political force in Washington — and previewed a series of looming battles over a financial technology attracting billions of dollars of interest from Wall Street, Silicon Valley and financial players around the world, but that few still understand. It gets worse. Kate Riga reports that In Fit Of Pique, Shelby Kills Crypto Compromise:Sen. Richard Shelby (R-AL) killed a hard-earned cryptocurrency compromise amendment to the bipartisan infrastructure bill because his own amendment, to beef up the defense budget another $50 billion, was rejected by Sen. Bernie Sanders (I-VT). Shelby had tried to tack it on to the cryptocurrency amendment....So that’s basically it for the crypto amendment, which took the better part of the weekend for senators and the White House and hammer into a compromise. The issue here was that the un-amended bill would require:some cryptocurrency companies that provide a service “effectuating” the transfer of digital assets to report information on their users, as some other financial firms are required to do, in an effort to enforce tax compliance.Crypto supporters said the provision’s wording would seemingly apply to companies that have no ability to collect data on users, such as cryptocurrency miners, and could push a swath of the industry overseas. So maybe by accident Mining Is Money Transmission. The disaster scenarioThis article is far longer and far more interesting. It takes the form of a "stress test", discussing a scenario in which Bitcoin's "price" goes to zero and asking what the consequences for the broader financial markets and investors would be. It is hard to argue with the conclusion:Still, our extreme scenario suggests that leverage, stablecoins, and sentiment are the main channels through which any crypto-downturn, big or small, will spread more widely. And crypto is only becoming more entwined with conventional finance. Goldman Sachs plans to launch a crypto exchange-traded fund; Visa now offers a debit card that pays customer rewards in bitcoin. As the crypto-sphere expands, so too will its potential to cause wider market disruption. The article identifies a number of channels by which a Bitcoin collapse could "cause market disruption":Via the direct destruction of paper wealth for HODL-ers and actual losses for more recent purchasers.Via the stock price of companies, including cryptocurrency exchanges, payments companies, and chip companies such as Nvidia.Via margin calls on leveraged investments, either direct purchases of Bitcoin or derivatives.Via redemptions of stablecoins causing reserves to be liquidated.Via investor sentiment contagion from cryptocurrencies to other high-risk assets such as meme stocks, junk bonds, and SPACs.I agree that these are all plausible channels, but I have two main issues with the article.Issue #1: TetherSourceFirst, it fails to acknowledge that the spot market in Bitcoin is extremely thin (a sell order for 150BTC crashed the "price" by 10%), especially compared to the 10x larger market in Bitcoin derivatives, and that the "price" of Bitcoin and other cryptocurrencies is massively manipulated, probably via the "wildcat bank" of Tether. The article contains, but doesn't seem to connect, these facts:Fully 90% of the money invested in bitcoin is spent on derivatives like “perpetual” swaps—bets on future price fluctuations that never expire. Most of these are traded on unregulated exchanges, such as FTX and Binance, from which customers borrow to make bets even bigger....The extent of leverage in the system is hard to gauge; the dozen exchanges that list perpetual swaps are all unregulated. But “open interest” ... has grown from $1.6bn in March 2020 to $24bn today. This is not a perfect proxy for total leverage, as it is not clear how much collateral stands behind the various contracts. But forced liquidations of leveraged positions in past downturns give a sense of how much is at risk. On May 18th alone, as bitcoin lost nearly a third of its value, they came to $9bn....Because changing dollars for bitcoin is slow and costly, traders wanting to realise gains and reinvest proceeds often transact in stablecoins, which are pegged to the dollar or the euro. Such coins, the largest of which are Tether and USD coin, are now worth more than $100bn. On some crypto platforms they are the main means of exchange. SourceThat last paragraph is misleading. Fais Kahn writes:Binance also hosts a massive perpetual futures market, which are “cash-settled” using USDT. This allows traders to make leveraged bets of 100x margin or more...which, in laymen’s terms, is basically a speculative casino. That market alone provides around ~$27B of daily volume, where users deposit USDT to trade on margin. As a result, Binance is by far the biggest holder of USDT, with $17B sitting in its wallet. Bernhard Meuller writes:A more realistic estimate is that ~70% of the Tether supply (43.7B USDT) is located on centralized exchanges.Interestingly, only a small fraction of those USDT shows up in spot order books. One likely reason is that a large share is sitting on wallets to collateralize derivative positions, in particular perpetual futures. ... It’s important to understand that USDT perpetual futures implementations are 100% USDT-based, including collateralization, funding and settlement. So on the exchange that dominates bitcoin derivative trading, where the majority of "Fully 90% of the money invested in bitcoin" lives, USDT is the exclusive means of exchange. The entire market's connection to the underlying spot market is that:Prices are tied to crypto asset prices via clever incentives, but in reality, USDT is the only asset that ever changes hands between traders. Other than forced liquidations, the article does not analyze how the derivative market would respond to a massive drop in the Bitcoin "price", and whether Tether could continue to pump the "price". As money market funds did in the Global Financial Crisis, the article suggests that stablecoins would have problems:Issuers back their stablecoins with piles of assets, rather like money-market funds. But these are not solely, or even mainly, held in cash. Tether, for instance, says 50% of its assets were held in commercial paper, 12% in secured loans and 10% in corporate bonds, funds and precious metals at the end of March. A cryptocrash could lead to a run on stablecoins, forcing issuers to dump their assets to make redemptions. In July Fitch, a rating agency, warned that a sudden mass redemption of tethers could “affect the stability of short-term credit markets”. It is certainly true that the off-ramps from cryptocurrencies to fiat are constricted; that is a major reason for the existence of stablecoins. But Fais Kahn makes two points:If there were a sudden drop in the market, and investors wanted to exchange their USDT for real dollars in Tether’s reserve, that could trigger a “bank run” where the value dropped significantly below one dollar, and suddenly everyone would want their money. That could trigger a full on collapse.But when that might actually happen? When Bitcoin falls in the frequent crypto bloodbaths, users actually buy Tether - fleeing to the safety of the dollar. This actually drives Tether’s price up! And:Tether’s own Terms of Service say users may not be redeemed immediately. Forced to wait, many users would flee to Bitcoin for lack of options, driving the price up again. It isn't just Tether that doesn't allow winnings out. Carol Alexander's Binance’s Insurance Fund is a fascinating, detailed examination of Binance's extremely convenient "outage" as BTC crashed on May 19. Her subhead reads:How insufficient insurance funds might explain the outage of Binance’s futures platform on May 19 and the potentially toxic relationship between Binance and Tether. I certainly don't understand all the ramifications of the "toxic relationship between Binance and Tether", but the article's implicit assumption that they, and similar market particiapants, behave like properly regulated financial institutions is implausible. Alexander's take on the relationship, on the other hand, is alarmingly plausible:In May 2021 ... Tether reported that only 2.9% of all tokens are actually backed by cash reserves and about 50% is in commercial paper, a form of unsecured debt that is normally only issued by firms with high-quality debt ratings. he simultaneous growth of Binance and tether begs the question whether Binance itself is the issuer of a large fraction of tether’s $30 billion commercial paper. Binance's B2B platform is the main online broker for tether. Suppose Binance is in financial difficulties (possibly precipitated by using its own money rather than insurance funds to cover payment to counterparties of liquidated positions). Then the tether it orders and gives to customers might not be paid for with dollars, or bitcoin or any other form of cash, but rather with an IOU. That is, commerical paper on which it pays tether interest, until the term of the loan expires.No new tether has been issued since Binance's order of $3 bn [Correction 6 Aug: net $1bn transfer] was made highly visible to the public on 31 May. [Correction: 6 Aug: Another $1 bn tether was issued on 4 Aug]. Maybe this is because Tether's next audit is imminent, and the auditers may one day investigate the identity of the issuers of the 50% (or more, now) of commercial paper it has for reserves. If it were found that the main issuer was Binance (maybe followed by FTX) then the entire crypto asset market place would have been holding itself up with its own bootstraps! This would certainly explain why Matt Levine wrote:There is a fun game among financial journalists and other interested observers who try to find anyone who has actually traded commercial paper with Tether, or any of its actual holdings. The game is hard! As far as I know, no one has ever won it, or even scored a point; I have never seen anyone publicly identify a security that Tether holds or a counterparty that has traded commercial paper with it. If Tether's reserves were 50% composed of unsecured debt from unregulated exchanges like Binance ...Issue #2: Dynamic EffectsMy second problem with the article is that this paragraph shows The Economist sharing two common misconceptions about blockchain technology:A crash would puncture the crypto economy. Bitcoin miners—who compete to validate transactions and are rewarded with new coins—would have less incentive to carry on, bringing the verification process, and the supply of bitcoin, to a halt. First, it is true that, were the "price" of Bitcoin zero, mining would stop. But if mining stops, it is transactions that stop. Bitcoin HODL-ings would be frozen in place, not just worth zero on paper but actually useless because nothing could be done with them.Second, the idea that the goal of mining is to create new Bitcoin is simply wrong. The goal of mining is to secure the blockchain by making Sybil attacks implausibly expensive. The creation of new Bitcoin is a side-effect, intended to motivate miners to make the blockchain secure by making Sybil attacks implausibly expensive. The fact that Nakamoto intended mining to continue after the final Bitcoin had been created clearly demonstrates this.The article is based on this scenario:in order to grasp the growing links between the crypto-sphere and mainstream markets, imagine that the price of bitcoin crashes all the way to zero.A rout could be triggered either by shocks originating within the system, say through a technical failure, or a serious hack of a big cryptocurrency exchange. Or they could come from outside: a clampdown by regulators, for instance, or an abrupt end to the “everything rally” in markets, say in response to central banks raising interest rates. But, as the article admits, a discontinuous change from $44K or so to $0 is implausible. A rapid but continuous drop over, say, a month is more plausible, and it could bring issues that the article understandably fails to address.SourceAs the "price" drops two effects take place. First, the value of the mining reward in fiat currency decreases. The least efficient and least profitable miners become uneconomic and drop out, decreasing the hash rate and thus increasing the block time and reducing the rate at which transactions can be processed:Typically, it takes about 10 minutes to complete a block, but Feinstein told CNBC the bitcoin network has slowed down to 14- to 19-minute block times. This effect occurred during the Chinese government's crackdown, as shown in the graph of hash rate.SourceSecond, every 2016 blocks (about two weeks) the algorithm adjusts, in this case decreases, the difficulty and thus the cost of mining the next 2016 blocks. The idea is to restore the block time to about 10 minutes despite the reduction in the hash rate. When the Chinese crackdown took 52.2% of Bitcoin's hash power off-line, the algorithm made the biggest reduction in difficulty in Bitcoin's history.In our scenario, Bitcoin plunges over a month. Lets assume it starts just after a difficulty adjustment. The month is divided into two parts, with the initial difficulty for the first part, and a much reduced difficulty for the second part.In the first part the rapid "price" decrease makes all but the most efficient miners uneconomic, so the hash rate decreases rapidly and block production slows rapidly. Producing the 2016-th block takes a lot more than two weeks. This is a time when the demand for transactions will be extreme, but during this part the supply of transactions is increasingly restricted. This, as has happened in other periods of high transaction demand, causes transaction fees to spike to extraordinary levels. In normal times fees are less than 10% of miner income, but it is plausible that they would spike an order of magnitude or more, counteracting the drop in the economics of mining. But median fees of say $200 would increase the sense of panic in the spot market.Lets assume that, by the 2016-th block, that more than half the mining power has been rendered uneconomic, so that the block time is around 20 minutes. Thus the adjustment comes after three weeks. When it happens, the adjustment, being based on the total time taken in the first part, will be large but inadequate to correct for the reduced hash rate at the end of the first part. With our assumptions the adjustment will be for a 25% drop in hash power, but the actual drop will have been 50%. Block production will speed up, but only to about 15 minutes/block. Given the panic, fees will drop somewhat but remain high.As the adjustment appraoches there are a lot of disgruntled miners, whose investment in ASIC mining rigs has been rendered uneconomic. The rigs can't be repurposed for anything but other Proof-of-Work cryptocurrencies, which have all crashed because, as the article notes:Investors would probably also dump other cryptocurrencies. Recent tantrums have shown that where bitcoin goes, other digital monies follow, says Philip Gradwell of Chainalysis, a data firm. Recall that what the mining power is doing is securing the blockchain against attack. Once it became possible to rent large amounts of mining power, 51% attacks on minor alt-coins became endemic. For example, there were three successful attacks on Ethereum Classic in a single month. Before the adjustment, some fraction of the now-uneconomic Bitcoin mining power has migrated to the rental market. Even a small fraction can overwhelm other cryptocurrencies. As I write, the Bitcoin hash rate is around 110M TH/s. Dogecoin is the next largest "market cap" coin using Bitcoin-like Proof-of-Work. Its hash rate is around 230 TH/s, or 500,000 times smaller. Thus during the first part there was a tidal wave of attacks against every other Proof-of-Work cryptocurrency.It has never been possible to rent enugh mining power to attack a major cryptocurrency. But now we have more than 50% of the Bitcoin mining power sitting idle on the sidelines desperate for income. These miners have choices:They can resume mining Bitcoin. The more efficient of them can do so and still make a profit, but if they all do most will find it uneconomic.They can mine other Proof-of-Work cryptocurrencies. But even if only a tiny fraction of them do so, it will be uneconomic. And trust in the alt-coins has been destroyed by the wave of attacks.They can collaborate to mount double-spend attacks against Bitcoin, since they have more than half the mining power.They can collaborate to mount the kind of sabotage attack described by Eric Budish in The Economic Limits Of Bitcoin And The Blockchain, aiming to profit by shorting Bitcoin in the derivative market and destroying confidence in the asset's security.The security of Proof-of-Work blockchains depends upon the unavailability of enough mining power to mount an attack. A massive, sustained drop in the value of Bitcoin would free up enormous amounts of mining power, far more than enough to destroy any smaller cryptocurrency, and probably enough to destroy Bitcoin. Open Knowledge Foundation: The registration for the first EU Open Data Days is open! Here at Open Knowledge Foundation, we are really pleased to see that registration is now open for participants for the first EU Open Data Days. The programme lasts three days from 23rd to 25th November 2021, and is split in two main parts. – EU DataViz, a conference on open data and data visualisation for public administrations, from 23rd to 24th November 2021; and – EU Datathon, the annual EU open data competition, on 25th November 2021. It’s free and open for everyone to attend, and is designed for a broad audience – including experts, open data enthusiasts and the public. = = = = = = = Registration can be done here. = = = = = = = Since EU Open Data Days launched in March 2021, Open Knowledge Foundation is proud to be an official partner of EU Open Data Days. We hope to see you there ! John Mark Ockerbloom: Counting down to 1925 in the public domain We’re rapidly approaching another Public Domain Day, the day at the start of the year when a year’s worth of creative work joins the public domain. This will be the third year in a row that the US will have a full crop of new public domain works (after a prior 20-year drought), and once again, I’m noting and celebrating works that will be entering the public domain shortly. Approaching 2019, I wrote a one-post-a-day Advent Calendar for 1923 works throughout the month of December, and approaching 2020, I highlighted a few 1924 works, and related copyright issues, in a series of December posts called 2020 Vision. This year I took to Twitter, making one tweet per day featuring a different 1925 work and creator using the #PublicDomainDayCountdown hashtag. Tweets are shorter than blog posts, but I started 99 days out, so by the time I finish the series at the end of December, I’ll have written short notices on more works than ever. Since not everyone reads Twitter, and there’s no guarantee that my tweets will always be accessible on that site, I’ll reproduce them here. (This post has been updated to include all the tweets up to 2021, and in 2021 has been further updated to link to copies of some of the featured works.) The tweet links have been reformatted for the blog, and a few tweets have been recombined or otherwise edited. If you’d like to comment yourself on any of the works mentioned here, or suggest others I can feature, feel free to reply here or on Twitter. (My account there is @JMarkOckerbloom. You’ll also find some other people tweeting on the #PublicDomainDayCountdown hashtag, and you’re welcome to join in as well.) September 24: It’s F. Scott Fitzgerald’s birthday. His best-known book, The Great Gatsby, joins the US public domain 99 days from now, along with other works with active 1925 copyrights. #PublicDomainDayCountdown (Links to free online books by Fitzgerald here.) September 25: C. K. Scott-Moncrieff’s birthday’s today. He translated Proust’s Remembrance of Things Past (a controversial title, as the Public Domain Review notes). The Guermantes Way, his translation of Proust’s 3rd volume, joins the US public domain in 98 days. #PublicDomainDayCountdown September 26: Today is T.S. Eliot’s birthday. His poem “The Hollow Men” (which ends “…not with a bang but a whimper”) was first published in full in 1925, & joins the US public domain in 97 days. #PublicDomainDayCountdown More by & about him here. September 27: Lady Cynthia Asquith, born today in 1887, edited a number of anthologies that have long been read by children and fans of fantasy and supernatural fiction. Her first major collection, The Flying Carpet, joins the US public domain in 96 days. #PublicDomainDayCountdown September 28: As @Marketplace reported tonight, Agatha Christie’s mysteries remain popular after 100 years. In 95 days, her novel The Secret of Chimneys will join the US public domain, as will the expanded US Poirot Investigates collection. #PublicDomainDayCountdown September 29: Homer Hockett’s and Arthur Schlesinger, Sr.’s Political and Social History of the United States first came out in 1925, and was an influential college textbook for years thereafter. The first edition joins the public domain in 94 days. #PublicDomainDayCountdown September 30: Inez Haynes Gillmore Irwin died 50 years ago this month, after a varied, prolific writing career. This 2012 blog post looks at 4 of her books, including Gertrude Haviland’s Divorce, which joins the public domain in 93 days. #PublicDomainDayCountdown October 1: For some, spooky stories and themes aren’t just for October, but for the whole year. We’ll be welcoming a new year’s worth of Weird Tales to the public domain in 3 months. See what’s coming, and what’s already free online, here. #PublicDomainDayCountdown October 2: Misinformation and quackery has been a threat to public health for a long time. In 13 weeks, the 1925 book The Patent Medicine and the Public Health, by American quack-fighter Arthur J. Cramp joins the public domain. #PublicDomainDayCountdown October 3: Sophie Treadwell, born this day in 1885, was a feminist, modernist playwright with several plays produced on Broadway, but many of her works are now hard to find. Her 1925 play “Many Mansions” joins the public domain in 90 days. #PublicDomainDayCountdown October 4: It’s Edward Stratemeyer’s birthday. Books of his syndicate joining the public domain in 89 days include the debuts of Don Sturdy & the Blythe Girls, & further adventures of Tom Swift, Ruth Fielding, Baseball Joe, Betty Gordon, the Bobbsey Twins, & more. #PublicDomainDayCountdown October 5: Russell Wilder was a pioneering diabetes doctor, testing newly invented insulin treatments that saved many patients’ lives. His 1925 book Diabetes: Its Cause and its Treatment with Insulin joins the public domain in 88 days. #PublicDomainDayCountdown October 6: Queer British Catholic author Radclyffe Hall is best known for The Well of Loneliness. Hall’s earlier novel A Saturday Life is lighter, though it has some similar themes in subtext. It joins the US public domain in 87 days. #PublicDomainDayCountdown October 7: Edgar Allan Poe’s stories have long been public domain, but some work unpublished when he died (on this day in 1849) stayed in © much longer. In 86 days, the Valentine Museum’s 1925 book of his previously unpublished letters finally goes public domain. #PublicDomainDayCountdown October 8: In 1925, the Nobel Prize in Literature went to George Bernard Shaw. In 85 days, his Table-Talk, published that year, will join the public domain in the US, and all his solo works published in his lifetime will be public domain nearly everywhere else. #PublicDomainDayCountdown October 9: Author and editor Edward Bok was born this day in 1863. In Twice Thirty (1925), he follows up his Pulitzer-winning memoir The Americanization of Edward Bok with a set of essays from the perspective of his 60s. It joins the public domain in 84 days. #PublicDomainDayCountdown October 10: In the 1925 silent comedy “The Freshman”, Harold Lloyd goes to Tate University, “a large football stadium with a college attached”, and goes from tackling dummy to unlikely football hero. It joins the public domain in 83 days. #PublicDomainDayCountdown October 11: It’s François Mauriac’s birthday. His Le Desert de l’Amour, a novel that won the 1926 Grand Prix of the Académie Française, joins the US public domain in 82 days. Published translations may stay copyrighted, but Americans will be free to make new ones. #PublicDomainDayCountdown October 12: Pulitzer-winning legal scholar Charles Warren’s Congress, the Constitution, and the Supreme Court (1925) analyzes controversies, some still argued, over relations between the US legislature and the US judiciary. It joins the public domain in 81 days. #PublicDomainDayCountdown October 13: Science publishing in 1925 was largely a boys’ club, but some areas were more open to women authors, such as nursing & science education. I look forward to Maude Muse’s Textbook of Psychology for Nurses going public domain in 80 days. #PublicDomainDayCountdown #AdaLovelaceDay October 14: Happy birthday to poet E. E. Cummings, born this day in 1894. (while some of his poetry is lowercase he usually still capitalized his name when writing it out) His collection XLI Poems joins the public domain in 79 days. #PublicDomainDayCountdown October 15: It’s PG Wodehouse’s birthday. In 78 days more of his humorous stories join the US public domain, including Sam in the Suburbs. It originally ran as a serial in the Saturday Evening Post in 1925. All that year’s issues also join the public domain then. #PublicDomainDayCountdown October 16: Playwright and Nobel laureate Eugene O’Neill was born today in 1888. His “Desire Under the Elms” entered the US public domain this year; in 77 days, his plays “Marco’s Millions” and “The Great God Brown” will join it. #PublicDomainDayCountdown October 17: Not everything makes it to the end of the long road to the US public domain. In 76 days, the copyright for the film Man and Maid (based on a book by Elinor Glyn) expires, but no known copies survive. Maybe someone will find one? #PublicDomainDayCountdown October 18: Corra Harris became famous for her novel A Circuit Rider’s Wife and her World War I reporting. The work she considered her best, though, was As a Woman Thinks. It joins the public domain in 75 days. #PublicDomainDayCountdown October 19: Edna St. Vincent Millay died 70 years ago today. All her published work joins the public domain in 74 days in many places outside the US. Here, magazine work like “Sonnet to Gath” (in Sep 1925 Vanity Fair) will join, but renewed post-’25 work stays in ©. #PublicDomainDayCountdown October 20: All songs eventually reach the public domain. Authors can put them there themselves, like Tom Lehrer just did for his lyrics. But other humorous songs arrive by the slow route, like Tilzer, Terker, & Heagney’s “Pardon Me (While I Laugh)” will in 73 days. #PublicDomainDayCountdown October 21: Sherwood Anderson’s Winesburg, Ohio wasn’t a best-seller when it came out, but his Dark Laughter was. Since Joycean works fell out of fashion, that book’s been largely forgotten, but may get new attention when it joins the public domain in 72 days. #PublicDomainDayCountdown October 22: Artist NC Wyeth was born this day in 1882. The Brandywine Museum near Philadelphia shows many of his works. His illustrated edition of Francis Parkman’s book The Oregon Trail joins the public domain in 71 days. #PublicDomainDayCountdown October 23: Today (especially at 6:02, on 10/23) many chemists celebrate #MoleDay. In 70 days, they’ll also get to celebrate historically important chemistry publications joining the US public domain, including all 1925 issues of Justus Liebigs Annalen der Chemie. #PublicDomainDayCountdown October 24: While some early Alfred Hitchcock films were in the US public domain for a while due to formality issues, the GATT accords restored their copyrights. His directorial debut, The Pleasure Garden, rejoins the public domain (this time for good) in 69 days. #PublicDomainDayCountdown (Addendum: There may still be one more year of copyright to this film as of 2021; see the comments to this post for details.) October 25: Albert Barnes took a different approach to art than most of his contemporaries. The first edition of The Art in Painting, where he explains his theories and shows examples from his collection, joins the public domain in 68 days. #PublicDomainDayCountdown October 26: Prolific writer Carolyn Wells had a long-running series of mystery novels featuring Fleming Stone. Here’s a blog post by The Passing Tramp on one of them, The Daughter of the House, which will join the public domain in 67 days. #PublicDomainDayCountdown October 27: Theodore Roosevelt was born today in 1858, and died over 100 years ago, but some of his works are still copyrighted. In 66 days, 2 volumes of his correspondence with Henry Cabot Lodge, written from 1884-1918 and published in 1925, join the public domain. #PublicDomainDayCountdown October 28: American composer and conductor Howard Hanson was born on this day in 1896. His choral piece “Lament for Beowulf” joins the public domain in 65 days. #PublicDomainDayCountdown October 29: “Skitter Cat” was a white Persian cat who had adventures in several children’s books by Eleanor Youmans, illustrated by Ruth Bennett. The first of the books joins the public domain in 64 days. #PublicDomainDayCountdown #NationalCatDay October 30: “Secret Service Smith” was a detective created by Canadian author R. T. M. Maitland. His first magazine appearance was in 1920; his first original full-length novel, The Black Magician, joins the public domain in 9 weeks. #PublicDomainDayCountdown October 31: Poet John Keats was born this day in 1795. Amy Lowell’s 2-volume biography links his Romantic poetry with her Imagist poetry. (1 review.) She finished and published it just before she died. It joins the public domain in 62 days. #PublicDomainDayCountdown November 1: “Not just for an hour, not for just a day, not for just a year, but always.” Irving Berlin gave the rights to this song to his bride in 1926. Both are gone now, and in 2 months it will join the public domain for all of us, always. #PublicDomainDayCountdown November 2: Mikhail Fokine’s The Dying Swan dance, set to music by Camille Saint-Saëns, premiered in 1905, but its choreography wasn’t published until 1925, the same year a film of it was released. It joins the public domain in 60 days. #PublicDomainDayCountdown (Choreography copyright is weird. Not only does the term not start until publication, which can be long after 1st performance, but what’s copyrightable has also changed. Before 1978 it had to qualify as dramatic; now it doesn’t, but it has to be more than a short step sequence.) November 3: Herbert Hoover was the only sitting president to be voted out of office between 1912 & 1976. Before taking office, he wrote the foreword to Carolyn Crane’s Everyman’s House, part of a homeowners’ campaign he co-led. It goes out of copyright in 59 days. #PublicDomainDayCountdown November 4: “The Golden Cocoon” is a 1925 silent melodrama featuring an election, jilted lovers, and extortion. The Ruth Cross novel it’s based on went public domain this year. The film will join it there in 58 days. #PublicDomainDayCountdown November 5: Investigative journalist Ida Tarbell was born today in 1857. Her History of Standard Oil helped break up that trust in 1911, but her Life of Elbert H. Gary wrote more admiringly of his chairmanship of US Steel. It joins the public domain in 57 days. #PublicDomainDayCountdown November 6: Harold Ross was born on this day in 1892. He was the first editor of The New Yorker, which he established in coöperation with his wife, Jane Grant. After ninety-five years, the magazine’s first issues are set to join the public domain in fifty-six days. #PublicDomainDayCountdown November 7: “Sweet Georgia Brown” by Ben Bernie & Maceo Pinkard (lyrics by Kenneth Casey) is a jazz standard, the theme tune of the Harlem Globetrotters, and a song often played in celebration. One thing we can celebrate in 55 days is it joining the public domain. #PublicDomainDayCountdown November 8: Today I hiked on the Appalachian Trail. It was completed in 1937, but parts are much older. Walter Collins O’Kane’s Trails and Summits of the White Mountains, published in 1925 when the AT was more idea than reality, goes public domain in 54 days. #PublicDomainDayCountdown November 9: In Sinclair Lewis’ Arrowsmith, a brilliant medical researcher deals with personal and ethical issues as he tries to find a cure for a deadly epidemic. The novel has stayed relevant well past its 1925 publication, and joins the public domain in 53 days. #PublicDomainDayCountdown November 10: John Marquand was born today in 1893. He’s known for his spy stories and satires, but an early novel, The Black Cargo, features a sailor curious about a mysterious payload on a ship he’s been hired onto. It joins the US public domain in 52 days. #PublicDomainDayCountdown November 11: The first world war, whose armistice was 102 years ago today, cast a long shadow. Among the many literary works looking back to it is Ford Madox Ford’s novel No More Parades, part of his “Parade’s End” tetralogy. It joins the public domain in 51 days. #PublicDomainDayCountdown November 12: Anne Parrish was born on this day in 1888. In 1925, The Dream Coach, co-written with her brother, got a Newbery honor , and her novel The Perennial Bachelor was a best-seller. The latter book joins the public domain in 50 days. #PublicDomainDayCountdown November 13: In “The Curse of the Golden Cross”, G. K. Chesterton’s Father Brown once again finds a natural explanation to what seem to be preternatural symbols & events. As of today, Friday the 13th, the 1925 story is exactly 7 weeks away from the US public domain. #PublicDomainDayCountdown November 14: The pop standard “Yes Sir, That’s My Baby” was the baby of Walter Donaldson (music) and Gus Kahn (lyrics). It’s been performed by many artists since its composition, and in 48 days, this baby steps out into the public domain. #PublicDomainDayCountdown November 15: Marianne Moore, born on this day in 1887, had a long literary career, including editing the influential modernist magazine The Dial from 1925 on. In 47 days, all 1925 issues of that magazine will be fully in the public domain. #PublicDomainDayCountdown November 16: George S. Kaufman, born today in 1889, wrote or directed a play in every Broadway season from 1921 till 1958. In 46 days, several of his plays join the public domain, including his still-performed comedy “The Butter and Egg Man”. #PublicDomainDayCountdown November 17: Shen of the Sea was a Newbery-winning collection of stories presented as “Chinese” folktales, but written by American author Arthur Bowie Chrisman. Praised when first published, seen more as appropriation later, it’ll be appropriable itself in 45 days. #PublicDomainDayCountdown November 18: Jacques Maritain was a French Catholic philosopher who influenced the Universal Declaration of Human Rights. His book on 3 reformers (Luther, Descartes, and Rousseau) joins the public domain in 44 days. #PublicDomainDayCountdown November 19: Prevailing views of history change a lot over 95 years. The 1926 Pulitzer history prize went to a book titled “The War for Southern Independence”. The last volume of Edward Channing’s History of the United States, it joins the public domain in 43 days. #PublicDomainDayCountdown November 20: Alfred North Whitehead’s Science and the Modern World includes a nuanced discussion of science and religion differing notably from many of his contemporaries’. (A recent review of it.) It joins the US public domain in 6 weeks. November 21: Algonquin Round Table member Robert Benchley tried reporting, practical writing, & reviews, but soon found that humorous essays & stories were his forte. One early collection, Pluck and Luck, joins the public domain in 41 days. #PublicDomainDayCountdown November 22: I’ve often heard people coming across a piano sit down & pick out Hoagy Carmichael’s “Heart and Soul”. He also had other hits, one being “Washboard Blues“. His original piano instrumental version becomes public domain in 40 days. #PublicDomainDayCountdown November 23: Harpo Marx, the Marx Brothers mime, was born today in 1888. In his oldest surviving film, “Too Many Kisses” he does “speak”, but silently (like everyone else in it), without his brothers. It joins the public domain in 39 days. #PublicDomainDayCountdown November 24: In The Man Nobody Knows, Bruce Barton likened the world of Jesus to the world of business. Did he bring scriptural insight to management, or subordinate Christianity to capitalism? It’ll be easier to say, & show, after it goes public domain in 38 days. #PublicDomainDayCountdown November 25: Before Virgil Thomson (born today in 1896) was well-known as a composer, he wrote a music column for Vanity Fair. His first columns, and the rest of Vanity Fair for 1925, join the public domain in 37 days. #PublicDomainDayCountdown November 26: “Each moment that we’re apart / You’re never out of my heart / I’d rather be lonely and wait for you only / Oh how I miss you tonight” Those staying safe by staying apart this holiday might appreciate this song, which joins the public domain in 36 days. #PublicDomainDayCountdown (The song, “Oh, How I Miss You Tonight” is by Benny Davis, Joe Burke, and Mark Fisher, was published in 1925, and performed and recorded by many musicians since then, some of whom are mentioned in this Wikipedia article.) November 27: Feminist author Katharine Anthony, born today in 1877, was best known for her biographies. Her 1925 biography of Catherine the Great, which drew extensively on the empress’s private memoirs, joins the public domain in 35 days. #PublicDomainDayCountdown November 28: Tonight in 1925 “Barn Dance” (soon renamed “Grand Ole Opry”) debuted in Nashville. Most country music on it & similar shows then were old favorites, but there were new hits too, like “The Death of Floyd Collins”, which joins the public domain in 34 days. #PublicDomainDayCountdown (The song, with words by Andrew Jenkins and music by John Carson, was in the line of other disaster ballads that were popular in the 1920s. This particular disaster had occurred earlier in the year, and became the subject of song, story, drama, and film.) November 29: As many folks get ready for Christmas, many Christmas-themed works are also almost ready to join the public domain in 33 days. One is The Holly Hedge, and Other Christmas Stories by Temple Bailey. More on the book & author. #PublicDomainDayCountdown November 30: In 1925 John Maynard Keynes published The Economic Consequences of Sterling Parity objecting to Winston Churchill returning the UK to the gold standard. That policy ended in 1931; the book’s US copyright lasted longer, but will finally end in 32 days. #PublicDomainDayCountdown December 1: Du Bose Heyward’s novel Porgy has a distinguished legacy of adaptations, including a 1927 Broadway play, and Gershwin’s opera “Porgy and Bess”. When the book joins the public domain a month from now, further adaptation possibilities are limitless. #PublicDomainDayCountdown December 2: In Dorothy Black’s Romance — The Loveliest Thing a young Englishwoman “inherits a small sum of money, buys a motor car and goes off in search of adventure and romance”. First serialized in Ladies’ Home Journal, it joins the public domain in 30 days. #PublicDomainDayCountdown December 3: Joseph Conrad was born on this day in 1857, and died in 1924, leaving unfinished his Napoleonic novel Suspense. But it was still far enough along to get serialized in magazines and published as a book in 1925, and it joins the public domain in 29 days. #PublicDomainDayCountdown December 4: Ernest Hemingway’s first US-published story collection In Our Time introduced his distinctive style to an American audience that came to view his books as classics of 20th century fiction: It joins the public domain in 28 days. #PublicDomainDayCountdown December 5: Libertarian author Rose Wilder Lane helped bring her mother’s “Little House” fictionalized memoirs into print. Before that, she published biographical fiction based on the life of Jack London, called He Was a Man. It joins the public domain in 27 days. #PublicDomainDayCountdown December 6: Indiana naturalist and author Gene Stratton-Porter died on this day in 1924. Her final novel, The Keeper of the Bees, was published the following year, and joins the public domain in 26 days. One review. #PublicDomainDayCountdown December 7: Willa Cather was born today in 1873. Her novel The Professor’s House depicts 1920s cultural dislocation from a different angle than F. Scott Fitzgerald’s better-known Great Gatsby. It too joins the public domain in 25 days. #PublicDomainDayCountdown December 8: The last symphony published by Finnish composer Jean Sibelius (born on this day in 1865) is described in the Grove Dictionary as his “most remarkable compositional achievement”. It joins the public domain in the US in 24 days. #PublicDomainDayCountdown December 9: When the Habsburg Empire falls, what comes next for the people & powers of Vienna? The novel Old Wine, by Phyllis Bottome (wife of the local British intelligence head) depicts a society undergoing rapid change. It joins the US public domain in 23 days. #PublicDomainDayCountdown December 10: Lewis Browne was “a world traveler, author, rabbi, former rabbi, lecturer, socialist and friend of the literary elite”. His first book, Stranger than Fiction: A Short History of the Jews, joins the public domain in 22 days. #PublicDomainDayCountdown December 11: In 1925, John Scopes was convicted for teaching evolution in Tennessee. Books explaining the science to lay audiences were popular that year, including Henshaw Ward’s Evolution for John Doe. It becomes public domain in 3 weeks. #PublicDomainDayCountdown December 12: Philadelphia artist Jean Leon Gerome Ferris was best known for his “Pageant of a Nation” paintings. Three of them, “The Birth of Pennsylvania”, “Gettysburg, 1863”, and “The Mayflower Compact”, join the public domain in 20 days. #PublicDomainDayCountdown December 13: The Queen of Cooks, and Some Kings was a memoir of London hotelier Rosa Lewis, as told to Mary Lawton. Her life story was the basis for the BBC and PBS series “The Duchess of Duke Street”. It joins the public domain in 19 days. #PublicDomainDayCountdown December 14: Today we’re celebrating new films being added to the National Film Registry. In 18 days, we can also celebrate more Registry films joining the public domain. One is The Clash of the Wolves, starring Rin Tin Tin. #PublicDomainDayCountdown December 15: Etsu Inagaki Sugimoto, daughter of a high-ranking Japanese official, moved to the US in an arranged marriage after her family fell on hard times. Her 1925 memoir, A Daughter of the Samurai, joins the public domain in 17 days. #PublicDomainDayCountdown December 16: On the Trail of Negro Folk-Songs compiled by Dorothy Scarborough assisted by Ola Lee Gulledge, has over 100 songs. Scarborough’s next of kin (not Gulledge, or any of their sources) renewed its copyright in 1953. But in 16 days, it’ll be free for all. #PublicDomainDayCountdown December 17: Virginia Woolf’s writings have been slowly entering the public domain in the US. We’ve had the first part of her Mrs. Dalloway for a while. The complete novel, and her first Common Reader essay collection, join it in 15 days. #PublicDomainDayCountdown December 18: Lovers in Quarantine with Harrison Ford sounds like a movie made for 2020, but it’s actually a 1925 silent comedy (with a different Harrison Ford). It’ll be ready to go out into the public domain after a 14-day quarantine. #PublicDomainDayCountdown December 19: Ma Rainey wrote, sang, and recorded many blues songs in a multi-decade career. Two of her songs becoming public domain in 13 days are “Shave ’em Dry” (written with William Jackson) & “Army Camp Harmony Blues” (with Hooks Tilford). #PublicDomainDayCountdown December 20: For years we’ve celebrated the works of prize-winning novelist Edith Wharton as her stories join the public domain. In 12 days, The Writing of Fiction, her book on how she writes her memorable tales, will join that company. #PublicDomainDayCountdown December 21: Albert Payson Terhune, born today in 1872, raised and wrote about dogs he kept at what’s now a public park in New Jersey. His book about Wolf, who died heroically and is buried there, will also be in the public domain in 11 days. #PublicDomainDayCountdown December 22: In the 1920s it seemed Buster Keaton could do anything involving movies. Go West, a 1925 feature film that he co-wrote, directed, co-produced, and starred in, is still enjoyed today, and it joins the public domain in 10 days. #PublicDomainDayCountdown December 23: In 9 days, not only will Theodore Dreiser’s massive novel An American Tragedy be in the public domain, but so will a lot of the raw material that went into it. Much of it is in @upennlib‘s special collections. #PublicDomainDayCountdown December 24: Johnny Gruelle, born today in 1880, created the Raggedy Ann doll, and a series of books sold with it that went under many Christmas trees. Two of them, Raggedy Ann’s Alphabet Book and Raggedy Ann’s Wishing Pebble, join the public domain in 8 days. #PublicDomainDayCountdown December 25: Written in Hebrew by Joseph Klausner, translated into English by Anglican priest Herbert Danby, Jesus of Nazareth reviewed Jesus’s life and teachings from a Jewish perspective. It made a stir when published in 1925, & joins the public domain in 7 days. #PublicDomainDayCountdown December 26: “It’s a travesty that this wonderful, hilarious, insightful book lives under the inconceivably large shadow cast by The Great Gatsby.” A review of Anita Loos’s Gentlemen Prefer Blondes, also joining the public domain in 6 days. #PublicDomainDayCountdown December 27: “On revisiting Manhattan Transfer, I came away with an appreciation not just for the breadth of its ambition, but also for the genius of its representation.” A review of the John Dos Passos novel becoming public domain in 5 days. #PublicDomainDayCountdown December 28: All too often legal systems and bureaucracies can be described as “Kafkaesque”. The Kafka work most known for that sense of arbitrariness and doom is Der Prozess (The Trial), reviewed here. It joins the public domain in 4 days. #PublicDomainDayCountdown December 29: Chocolate Kiddies, an African American music and dance revue that toured Europe in 1925, featured songs by Duke Ellington and Jo Trent including “Jig Walk”, “Jim Dandy”, and “With You”. They join the public domain in 3 days. #PublicDomainDayCountdown December 30: Lon Chaney starred in 2 of the top-grossing movies of 1925. The Phantom of the Opera has long been in the public domain due to copyright nonrenewal. The Unholy Three, which was renewed, joins it in the public domain in 2 days. #PublicDomainDayCountdown (If you’re wondering why some of the other big film hits of 1925 haven’t been in this countdown, in many cases it’s also because their copyrights weren’t renewed. Or they weren’t actually copyrighted in 1925.) December 31: “…You might as well live.” Dorothy Parker published “Resumé” in 1925, and ultimately outlived most of her Algonquin Round Table-mates. This poem, and her other 1925 writing for periodicals, will be in the public domain tomorrow. #PublicDomainDayCountdown Ed Summers: Opinionated I’ve never really been a huge fan of the Basecamp Philosophy of software development–especially since the no-politics fiasco. Calling it a philosophy is probably giving it a lot more credit than it deserves, since it largely seems to be thinly veiled marketing. But I’ll admit to having liked one idea that they promulgated since the early days of Ruby on Rails: that software should be opinionated. The hero narrative of the individual software developer or software user having an opinion and voicing it through the design or use of some software seems wrongheaded. Software is made and used by people in groups, whether that group is realized and cultivated or not. However the basic idea here that software expresses opinions, and for designers and users to consciously express those opinions is a useful way to to think about design. But the thing Your App Should Take Sides really gets wrong is the suggestion that it’s possible for an app not to express opinions–as if one could do otherwise, and that some kind of neutrality is possible. Software always takes sides, and expresses opinions–and often embodies multiple opinions in several arguments or controversies, rather than just one. The question is, do you understand the opinions it is expressing, and the decisions that are being made to express them? How can these decisions be negotiated as a group that includes the designers and users of the software? Hint: it works best when there is significant overlap between the two. Jonathan Rochkind: logging URI query params with lograge The lograge gem for taming Rails logs by default will lot the path component of the URI, but leave out the query string/query params. For instance, perhaps you have a URL to your app /search?q=libraries. lograge will log something like: method=GET path=/search format=html… The q=libraries part is completely left out of the log. I kinda want that part, it’s important. The lograge README provides instructions for “logging request parameters”, by way of the params hash. I’m going to modify them a bit slightly to: use the more recent custom_payload config instead of custom_options. (I’m not certain why there are both, but I think mostly for legacy reasons and newer custom_payload? is what you should read for?)If we just put params in there, then a bunch of ugly <ActionController::Parameters show up in the log if you have nested hash params. We could fix that with params.to_unsafe_h, but…We should really use request.filtered_parameters instead to make sure we’re not logging anything that’s been filtered out with Rails 6 config.filter_parameters. (Thanks /u/ezekg on reddit). This also converts to an ordinary hash that isn’t ActionController::Parameters, taking care of previous bullet point. (It kind of seems like lograge README could use a PR updating it?) config.lograge.custom_payload do |controller| exceptions = %w(controller action format id) params: controller.request.filtered_parameters.except(*exceptions) end That gets us a log line that might look something like this: method=GET path=/search format=html controller=SearchController action=index status=200 duration=107.66 view=87.32 db=29.00 params={"q"=>"foo"} OK. The params hash isn’t exactly the same as the query string, it can include things not in the URL query string (like controller and action, that we have to strip above, among others), and it can in some cases omit things that are in the query string. It just depends on your routing and other configuration and logic. The params hash itself is what default rails logs… but what if we just log the actual URL query string instead? Benefits: it’s easier to search the logs for actually an exact specific known URL (which can get more complicated like /search?q=foo&range%5Byear_facet_isim%5D%5Bbegin%5D=4&source=foo or something). Which is something I sometimes want to do, say I got a URL reported from an error tracking service and now I want to find that exact line in the log. I actually like having the exact actual URL (well, starting from path) in the logs. It’s a lot simpler, we don’t need to filter out controller/action/format/id etc. It’s actually a bit more concise? And part of what I’m dealing with in general using lograge is trying to reduce my bytes of logfile for papertrail! Drawbacks? if you had some kind of structured log search (I don’t at present, but I guess could with papertrail features by switching to json format?), it might be easier to do something like “find a /search with q=foo and source=ef without worrying about other params)To the extent that params hash can include things not in the actual url, is that important to log like that?….? Curious what other people think… am I crazy for wanting the actual URL in there, not the params hash? At any rate, it’s pretty easy to do. Note we use filtered_path rather than fullpath to again take account of Rails 6 parameter filtering, and thanks again /u/ezekg: config.lograge.custom_payload do |controller| { path: controller.request.filtered_path } end This is actually overwriting the default path to be one that has the query string too: method=GET path=/search?q=libraries format=html ... You could of course add a different key fullpath instead, if you wanted to keep path as it is, perhaps for easier collation in some kind of log analyzing system that wants to group things by same path invariant of query string. I’m gonna try this out! Meanwhile, on lograge… As long as we’re talking about lograge…. based on commit history, history of Issues and Pull Requests… the fact that CI isn’t currently running (travis.org grr) and doesn’t even try to test on Rails 6.0+ (although lograge seems to work fine)… one might worry that lograge is currently un/under-maintained…. No comment on a GH issue filed in May asking about project status. It still seems to be one of the more popular solutions to trying to tame Rails kind of out of control logs. It’s mentioned for instance in docs from papertrail and honeybadger, and many many other blog posts. What will it’s future be? Looking around for other possibilties, I found semantic_logger (rails_semantic_logger). It’s got similar features. It seems to be much more maintained. It’s got a respectable number of github stars, although not nearly as many as lograge, and it’s not featured in blogs and third-party platform docs nearly as much. It’s also a bit more sophisticated and featureful. For better or worse. For instance mainly I’m thinking of how it tries to improve app performance by moving logging to a background thread. This is neat… and also can lead to a whole new class of bug, mysterious warning, or configuration burden. For now I’m sticking to the more popular lograge, but I wish it had CI up that was testing with Rails 6.1, at least! Incidentally, trying to get Rails to log more compactly like both lograge and rails_semantic_logger do… is somewhat more complicated than you might expect, as demonstrated by the code in both projects that does it! Especially semantic_logger is hundreds of lines of somewhat baroque code split accross several files. A refactor of logging around Rails 5 (I think?) to use ActiveSupport::LogSubscriber made it possible to customize Rails logging like this (although I think both lograge and rails_semantic_logger still do some monkey-patching too!), but in the end didn’t make it all that easy or obvious or future-proof. This may discourage too many other alternatives for the initial primary use case of both lograge and rails_semantic_logger — turn a rails action into one log line, with a structured format. HangingTogether: A research roadmap for Building a National Finding Aid Network (NAFAN) I had the opportunity to develop and lead the OCLC project team in the user research design phase for the IMLS-funded project (grant number LG-246349-OLS-20), Building a National Finding Aid Network (NAFAN). We are collaborating on this project with our colleagues at the California Digital Library and the University of Virginia. For additional details about the project, please refer to the project web page and the November 10, 2020 Hanging Together blog post written by my colleague Merrilee Proffitt. Other research includes conducting an evaluation of finding aid data quality, described in a July 28, 2021 Hanging Together blog post by Bruce Washburn. The OCLC project team developed a mixed methods approach to learn about individuals who use aggregated archival descriptions. The intent is to identify who uses these descriptions, why they use them, and what preferences they have for discovery and access. We also are interested in learning from archive, museum, and library staff about how they create and publish archival descriptions. We want to identify and discuss the opportunities and challenges these staff experience when describing archival materials, sharing archival description on the web, and participating in or not participating in finding aid aggregations. As you will see below, this methodology is designed to capture information about the use, design and management of the current archival aggregation ecosystem, currently comprised of 13 separate websites. Findings from this research will inform the design of the future NAFAN platform. It aims to serve the needs for both users and contributors to create a meaningful, inclusive and low-barrier pathway for access to archival collections. Why archival user research is important Prior to developing the research questions and design, we reviewed the literature identifying studies related to aggregations of archival description which represent the holdings of multiple institutions. We recognized a gap in the literature. While there was research about user interaction with individual finding aids and the discovery systems of individual institutions, there was no research specifically identifying the users of aggregations and their research needs. Additionally, although some work has addressed the information-seeking behaviors and research practices of scholars using primary sources, it primarily has focused on historians and other academic researchers in the humanities. However, recent work on archival user personas at multiple institutions indicate that users of archives are not only scholars, but also genealogists, local historians, and enthusiast researchers; K-12 educators; and a range of researchers using archives for their professional and creative work in fields such as journalism, documentary filmmaking, fiction writing; as well as public services librarians and archivists. What we want to learn from the NAFAN research Based on the gaps in the identification of users of archive aggregators and the documented archival user personas, we decided to survey and subsequently interview users of aggregated archival description as well as the archive, museum, and library staff who provide description for archive collections. Building on questions outlined in the final report from the first phase of the NAFAN project, we prioritized a set of research questions to address the key issues that we believe are foundational to informing the other phases of the research for the formulation of technical and system requirements for modeling a national archival finding aid network. Research questions: Users of aggregated archival description  Who are the current users of aggregated archival description?Do current user types align with the persona types and needs identified in recent archival persona work (i.e., what characteristics are present vs. not)?   Why are the current users trying to discover and access archival collections via aggregation of archival description?   How are current users discovering and accessing aggregations of archival description? What are the benefits and challenges users face when searching archival description in aggregation?    Research questions: Staff who provide description for archival collections What are the enabling and constraining factors that influence whether organizations describe the archival collections in their care?  What are the enabling or constraining factors that influence whether organizations contribute to an aggregation of archival description?  What value does participation in an archival aggregation service bring to organizations?   What is the structure and extent of consistency across the body of metadata records in current aggregations of archival description?Can that body of metadata records support user needs identified in findings from the user research phase of the study?If so, how? If not, what are the gaps?   Why the data collection tools were selected to answer our research questions To develop a more complete picture of the users and creators of archival description, we used a mixed methods approach for data collection and analysis. Our research design includes a pop-up survey and individual semi-structured interviews with users of aggregated archival description and focus group interviews with archive, museum, and library staff who provide description for archive collections. The pop-up survey provides a broad overview of the demographics, rationale, and content and format preferences for using aggregated archival description while the semi-structured individual interviews provide more in-depth information on the hows and whys for using them. The focus group interviews with archivists enabled us to identify the challenges and opportunities experienced when contributing to archival aggregation services as well as their perceptions of their users’ needs and expectations. Focus Group Interviews: Ten virtual focus group interviews, with approximately five participants each for a total of 52 participants were conducted with archivists. The participants worked in institutions where the archives staff either currently contribute to an aggregation or in institutions where the archives staff currently do not contribute to an aggregation. Although focus group interview data cannot be generalized to an entire population, we used this methodology to identify archivists’ needs, expectations, and perceptions of contributing to and using aggregated archival description. The focus group interview data were digitally recorded and transcribed for content analysis of themes. The responses to the questions will be coded based on emerging themes and a comparative analysis of the responses will be calculated. The findings will be used for identifying contributor needs in relation to finding aid aggregations and evaluating the quality of existing finding aid data; technical assessments of potential systems to support network functions and formulating system requirements for a minimum viable national finding aid network; and community building, sustainability planning, and governance modeling to support subsequent phases of the project. User Pop-up Survey: A pop-up survey was implemented on 12 NAFAN partner aggregation websites from March 18 to May 31, 2021 to gather demographic data and information about why and how individuals use finding aid aggregations, as well as their preferred format types. It was designed and implemented using SurveyMonkey software and included both open-ended and fixed response questions. There were 3,300 completed and usable survey responses, providing a convenience sampling frame for this phase of the project. The pop-up survey data are being analyzed using descriptive statistics and a comparative analysis of the responses from the different types of users identified in the demographic data that we collected.   User Semi-Structured Interviews: All respondents of the pop-up survey were invited to participate in a virtual, 45-60 minute, one-on-one semi-structured interview that includes guided open-ended questions that are asked in the same order during each interview. The respondents were offered a $50 gift card, if they volunteered, were invited, and completed the interview. We are in the process of identifying 25 users who indicated they are interested in participating in the interviews and who represent the different user types and demographics of the pop-up survey respondents. These interviews will provide more detailed information about how and why individuals use finding aid aggregations. The data collected from the individual semi-structured interviews will be digitally recorded and transcribed for coding and analysis. Descriptive statistics and a comparative analysis of the responses in the demographic data collected during the individual semi-structured interview sessions will be calculated. A codebook will be developed from the themes emerging from the interviews and the transcribed responses to the open-ended questions will be coded in the NVivo software using the codebook. Following coding, we will gather samples of the answers and calculate inter-coder reliability to ensure consistency across the corpus of data. What’s next in NAFAN research OCLC Research will begin conducting the individual semi-structured interviews with users of aggregated archival description. These data will be analyzed, as will the data from the user pop-up survey and the archive, museum, and library staff focus group interviews. We will continue to share our findings and we welcome your feedback. Acknowledgements: I want to thank my project team colleagues, Chela Scott Weber, Lesley Langa, Brooke Doyle, Brittany Brannon, Merrilee Proffitt, and Janet Mason for their assistance with the focus group interview and survey data collection and analysis and to Chela, Lesley, and Merrilee for their review of this blog post. The post A research roadmap for Building a National Finding Aid Network (NAFAN) appeared first on Hanging Together. Jez Cope: mxadm: a small CLI Matrix room admin tool I’ve enjoyed learning Rust (the programming language) recently, but having only really used it for solving programming puzzles I’ve been looking for an excuse to use it for something more practical. At the same time, I’ve been using and learning about Matrix (the chat/messaging platform), and running some small rooms there I’ve been a bit frustrated that some pretty common admin things don’t have a good user interface in any of the available clients. So… I decided to write a little command-line tool to do a few simple tasks, and it’s now released as mxadm! It’s on crates.io, so if you have Rust and cargo available, installing it is as simple as running cargo install mxadm. I’ve only taught it to do a few things so far: List your joined rooms Add/delete a room alias Tombstone a room (i.e. redirect it to a new room) I’ll add more as I need them, and I’m open to suggestions too. It uses matrix-rust-sdk, the Matrix Client-Server SDK for Rust, which is built on the lower-level Ruma library, along with anyhow for error handling. The kind folks in the #matrix-rust-sdk:matrix.org have been particularly kind in helping me get started using it. More details from: Source code on tildegit mxadm on crates.io mxadm on lib.rs Suggestions, code reviews, pull requests all welcome, though it will probably take me a while to act on them. Enjoy! Eric Lease Morgan: Searching Project Gutenberg at the Distant Reader The venerable Project Gutenberg is a collection of about 60,000 transcribed editions of classic literature in the public domain, mostly from the Western cannon. A subset of about 30,000 Project Gutenberg items has been cached locally, indexed, and made available through a website called the Distant Reader. The index is freely for anybody and anywhere to use. This blog posting describes how to query the index. The index is rooted in a technology called Solr, a very popular indexing tool. The index supports simple searching, phrase searching, wildcard searches, fielded searching, Boolean logic, and nested queries. Each of these techniques are described below: simple searches – Enter any words you desire, and you will most likely get results. In this regard, it is difficult to break the search engine. phrase searches – Enclose query terms in double-quote marks to search the query as a phrase. Examples include: "tom sawyer", "little country schoolhouse", and "medieval europe". wildcard searches – Append an asterisk (*) to any non-phrase query to perform a stemming operation on the given query. For example, the query potato* will return results including the words potato and potatoes. fielded searches – The index has many different fields. The most important include: author, title, subject, and classification. To limit a query to a specific field, prefix the query with the name of the field and a colon (:). Examples include: title:mississippi, author:plato, or subject:knowledge. Boolean logic – Queries can be combined with three Boolean operators: 1) AND, 2) OR, or 3) NOT. The use of AND creates the intersection of two queries. The use of OR creates the union of two queries. The use of NOT creates the negation of the second query. The Boolean operators are case-sensitive. Examples include: love AND author:plato, love OR affection, and love NOT war. nested queries – Boolean logic queries can be nested to return more sophisticated sets of items; nesting allows you to override the way rudimentary Boolean operations get combined. Use matching parentheses (()) to create nested queries. An example includes (love NOT war) AND (justice AND honor) AND (classification:BX OR subject:"spiritual life"). Of all the different types of queries, nested queries will probably give you the most grief. Becase this index is a full text index on a wide variety of topics, you will probably need to exploit the query language to create truly meaningful results. David Rosenthal: Stablecoins Part 2 I wrote Stablecoins about Tether and its "magic money pump" seven months ago. A lot has happened and a lot has been written about it since, and some of it explores aspects I didn't understand at the time, so below the fold at some length I try to catch up.SourceIn the postscript to Stablecoins I quoted David Gerard's account of the December 16th pump that pushed BTC over $20K:We saw about 300 million Tethers being lined up on Binance and Huobi in the week previously. These were then deployed en masse.You can see the pump starting at 13:38 UTC on 16 December. BTC was $20,420.00 on Coinbase at 13:45 UTC. Notice the very long candles, as bots set to sell at $20,000 sell directly into the pump. In 2020 BTC had dropped from around $7.9K on March 10th to under $5K on March 11th. It spiked back up on March 18th, then gradually rose to just under $11K by October 10th.SourceDuring that time Tether issuance went from $4.7B to $15.7B, an increase of over 230% with large jumps on four occasions: March 28-29th: $1.6B = $4.6B to $6.2B (a weekend) May 12-13th $2.4B = $6.4B to $8.8B July 20th-21st $0.8B = $9.2B to $10B (a weekend) August 19-20th $3.4B = $10B to $13.4B (a weekend)SourceThen both BTC and USDT really took off, with BTC peaking April 13th at $64.9K, and USDT issuing more than $30B. BTC then started falling. Tether continued to issue USDT, peaking 55 days later on May 30th after nearly another $16B at $61.8B. Issuance slowed dramatically, peaking 19 days later on June 18th at $62.7B when BTC had dropped to $$35.8K, 55% of the peak. Since then USDT has faced gradual redemptions; it is now down to $61,8B.What on earth is going on? How could USDT go from around $6B to around $60B in just over a year?TetherSourceIn Crypto and the infinite ladder: what if Tether is fake?, the first of a two-part series, Fais Kahn asks the same question:Tether (USDT) is the most used cryptocurrency in the world, reaching volumes significantly higher than Bitcoin. Each coin is supposed to be backed by $1, making it “stable.” And yet no one knows if this is true.Even more odd: in the last year, USDT has exploded in size even faster than Bitcoin - going from $6B in market cap to over $60B in less than a year. This includes $40B of new supply - a straight line up - after the New York Attorney General accused Tether of fraud. I and many others have considered a scenario in which the admitted fact that USDT is not backed 1-for-1 by USD causes a "run on the bank". Among the latest is Taming Wildcat Stablecoins by Gary Gorton and Jeffery Zhang. Zhang is one of the Federal Reserve's attorney, but who is Gary Gorton? Izabella Kaminska explains:Over the course of his career, Gary Gorton has gained a reputation for being something of an experts’ expert on financial systems. Despite being an academic, this is in large part due to what might be described as his practitioner’s take on many key issues.The Yale School of Management professor is, for example, best known for a highly respected (albeit still relatively obscure) theory about the role played in bank runs by information-sensitive assets....the two authors make the implicit about stablecoins explicit: however you slice them, dice them or frame them in new technology, in the grand scheme of financial innovation stablecoins are actually nothing new. What they really amount to, they say, is another form of information sensitive private money, with stablecoin issuers operating more like unregulated banks.Gorton and Zhang write:The goal of private money is to be accepted at par with no questions asked. This did not occur during the Free Banking Era in the United States—a period that most resembles the current world of stablecoins. State-chartered banks in the Free Banking Era experienced panics, and their private monies made it very hard to transact because of fluctuating prices. That system was curtailed by the National Bank Act of 1863, which created a uniform national currency backed by U.S. Treasury bonds. Subsequent legislation taxed the state-chartered banks’ paper currencies out of existence in favor of a single sovereign currency. Unlike me, Kahn is a "brown guy in fintech", so he is better placed to come up with answers than I am. For a start, he is skeptical of the USDT "bank run" scenario:The unbacked scenario is what concerns investors. If there were a sudden drop in the market, and investors wanted to exchange their USDT for real dollars in Tether’s reserve, that could trigger a “bank run” where the value dropped significantly below one dollar, and suddenly everyone would want their money. That could trigger a full on collapse.But when that might actually happen? When Bitcoin falls in the frequent crypto bloodbaths, users actually buy Tether - fleeing to the safety of the dollar. This actually drives Tether’s price up! The only scenario that could hurt is when Bitcoin goes up, and Tether demand drops.But hold on. It’s extremely unlikely Tether is simply creating tokens out of thin air - at worst, there may be some fractional reserve (they themselves admitted at one point it was only 74% backed) that is split between USD and Bitcoin.The NY AG’s statement that Tether had “no bank anywhere in the world” strongly suggests some money being held in crypto (Tether has stated this is true, but less than 2%), and Tether’s own bank says they use Bitcoin to hold customer funds! That means in the event of a Tether drop/Bitcoin rise, they are hedged.Tether’s own Terms of Service say users may not be redeemed immediately. Forced to wait, many users would flee to Bitcoin for lack of options, driving the price up again. Kahn agrees with me that Tether may have a magic "money" pump:It’s possible Tether didn’t have the money at some point in the past. And it’s just as possible that, with the massive run in Bitcoin the last year Tether now has more than the $62B they claim!In that case Tether would seem to have constructed a perfect machine for printing money. (And America has a second central bank.) Of course, the recent massive run down in Bitcoin will have caused the "machine for printing money" to start running in reverse.Matt Levine listened to an interview with Tether's CTO Paolo Ardoino and General Counsel Stuart Hoegner, and is skeptical about Tether's backing:Tether is a stablecoin that we have talked about around here because it was sued by the New York attorney general for lying about its reserves, and because it subsequently disclosed its reserves in a format that satisfied basically no one. Tether now says that its reserves consist mostly of commercial paper, which apparently makes it one of the largest commercial paper holders in the world. There is a fun game among financial journalists and other interested observers who try to find anyone who has actually traded commercial paper with Tether, or any of its actual holdings. The game is hard! As far as I know, no one has ever won it, or even scored a point; I have never seen anyone publicly identify a security that Tether holds or a counterparty that has traded commercial paper with it. USDT reserve disclosureLevine contrasts Tether's reserve disclosure with that of another instrument that is supposed to maintain a stable value, a money market fund:Here is the website for the JPMorgan Prime Money Market Fund. If you click on the tab labeled “portfolio,” you can see what the fund owns. The first item alphabetically is $50 million face amount of asset-backed commercial paper issued by Alpine Securitization Corp. and maturing on Oct. 12. Its CUSIP — its official security identifier — is 02089XMG9. There are certificates of deposit at big banks, repurchase agreements, even a little bit of non-financial commercial paper. ... You can see exactly how much (both face amount and market value), and when it matures, and the CUSIP for each holding.JPMorgan is not on the bleeding edge of transparency here or anything; this is just how money market funds work. You disclose your holdings. BinanceBut the big picture is that USDT pumped $60B into cryptocurrencies. Where did the demand for the $60B come from? In my view, some of it comes from whales accumulating dry powder to use in pump-and-dump schemes like the one illustrated above. But Kahn has two different suggestions. First:One of the well-known uses for USDT is “shadow banking” - since real US dollars are highly regulated, opening an account with Binance and buying USDT is a straightforward way to get a dollar account.The CEO of USDC himself admits in this Coindesk article: “In particular in Asia where, you know, these are dollar-denominated markets, they have to use a shadow banking system to do it...You can’t connect a bank account in China to Binance or Huobi. So you have to do it through shadow banking and they do it through tether. And so it just represents the aggregate demand. Investors and users in Asia – it’s a huge, huge piece of it.” SourceSecond:Binance also hosts a massive perpetual futures market, which are “cash-settled” using USDT. This allows traders to make leveraged bets of 100x margin or more...which, in laymen’s terms, is basically a speculative casino. That market alone provides around ~$27B of daily volume, where users deposit USDT to trade on margin. As a result, Binance is by far the biggest holder of USDT, with $17B sitting in its wallet. Wikipedia describes "perpetual futures" thus:In finance, a perpetual futures contract, also known as a perpetual swap, is an agreement to non-optionally buy or sell an asset at an unspecified point in the future. Perpetual futures are cash-settled, and differ from regular futures in that they lack a pre-specified delivery date, and can thus be held indefinitely without the need to roll over contracts as they approach expiration. Payments are periodically exchanged between holders of the two sides of the contracts, long and short, with the direction and magnitude of the settlement based on the difference between the contract price and that of the underlying asset, as well as, if applicable, the difference in leverage between the two sides In Is Tether a Black Swan? Bernhard Mueller goes into more detail about Binance's market:According to Tether’s rich list, 17 billion Tron USDT are held by Binance alone. The list also shows 2.68B USDT in Huobi’s exchange wallets. That’s almost 20B USDT held by two exchanges. Considering those numbers, the value given by CryptoQuant appears understated. A more realistic estimate is that ~70% of the Tether supply (43.7B USDT) is located on centralized exchanges.Interestingly, only a small fraction of those USDT shows up in spot order books. One likely reason is that a large share is sitting on wallets to collateralize derivative positions, in particular perpetual futures. The CEX futures market is essentially a casino where traders bet on crypto prices with insane amounts of leverage. And it’s a massive market: Futures trading on Binance alone generated $60 billion in volume over the last 24 hours. It’s important to understand that USDT perpetual futures implementations are 100% USDT-based, including collateralization, funding and settlement. Prices are tied to crypto asset prices via clever incentives, but in reality, USDT is the only asset that ever changes hands between traders. This use-case generates significant demand for USDT. Why is this "massive perpetual futures market" so popular? Kahn provides answers:That crazed demand for margin trading is how we can explain one of the enduring mysteries of crypto - how users can get 12.5% interest on their holdings when banks offer less than 1%. SourceThe high interest is possible because:The massive supply of USDT, and the host of other dollar stablecoins like USDC, PAX, and DAI, creates an arbitrage opportunity. This brings in capital from outside the ecosystem seeking the “free money” making trades like this using a combination of 10x leverage and and 8.5% variance between stablecoins to generate an 89% profit in just a few seconds. If you’re only holding the bag for a minute, who cares if USDT is imaginary dollars? Rollicking good times like these attract the attention of regulators, as Amy Castor reported on July 2nd in Binance: A crypto exchange running out of places to hide:Binance, the world’s largest dark crypto slush fund, is struggling to find corners of the world that will tolerate its lax anti-money laundering policies and flagrant disregard for securities laws. As a result, Laurence Fletcher, Eva Szalay and Adam Samson report that Hedge funds back away from Binance after regulatory assault :The global regulatory pushback “should raise red flags for anyone keeping serious capital at the exchange”, said Ulrik Lykke, executive director at ARK36, adding that the fund has “scaled down” exposure....Lykke described it as “especially concerning” that the recent moves against Binance “involve multiple entities from across the financial sphere”, such as banks and payments groups. This leaves some serious money looking for an off-ramp from USDT to fiat. These are somewhat scarce:if USDT holders on centralized exchanges chose to run for the exits, USD/USDC/BUSD liquidity immediately available to them would be relatively small. ~44 billion USDT held on exchanges would be matched with perhaps ~10 billion in fiat currency and USDC/BUSD This, and the addictive nature of "a casino ... with insane amounts of leverage", probably account for the relatively small drop in USDT market cap since June 18th. Amy Castor reported July 13th on another reason in Binance: Fiat off-ramps keep closing, reports of frozen funds, what happened to Catherine Coley?:Binance customers are becoming trapped inside of Binance — or at least their funds are — as the fiat exits to the world’s largest crypto exchange close around them. You can almost hear the echoes of doors slamming, one by one, down a long empty corridor leading to nowhere.In the latest bit of unfolding drama, Binance told its customers today that it had disabled withdrawals in British Pounds after its key payment partner, Clear Junction, ended its business relationship with the exchange....There’s a lot of unhappy people on r/BinanceUS right now complaining their withdrawals are frozen or suspended — and they can’t seem to get a response from customer support either....Binance is known for having “maintenance issues” during periods of heavy market volatility. As a result, margin traders, unable to exit their positions, are left to watch in horror while the exchange seizes their margin collateral and liquidates their holdings. And it isn't just getting money out of Binance that is getting hard, as David Gerard reports:Binance is totally not insolvent! They just won’t give anyone their cryptos back because they’re being super-compliant. KYC/AML laws are very important to Binance, especially if you want to get your money back after suspicious activity on your account — such as pressing the “withdraw” button. Please send more KYC. [Binance] Issues like these tend to attract the attention of the mainstream press. On July 23rd the New York Times' Eric Lipton and Ephrat Livni profiled Sam Bankman-Fried of the FTX exchange in Crypto Nomads: Surfing the World for Risk and Profit:The highly leveraged form of trading these platforms offer has become so popular that the overall value of daily purchases and sales of these derivatives far surpasses the daily volume of actual cryptocurrency transactions, industry data analyzed by researchers at Carnegie Mellon University shows....FTX alone has one million users across the world and handles as much as $20 billion a day in transactions, most of them derivatives trades.Like their customers, the platforms compete. Mr. Bankman-Fried from FTX, looking to out promote BitMEX, moved to offer up to 101 times leverage on derivatives trades. Mr. Zhao from Binance then bested them both by taking it to 125. Then on the 25th, as the regulators' seriousness sank in, the same authors reported Leaders in Cryptocurrency Industry Move to Curb the Highest-Risk Trades:Two of the world’s most popular cryptocurrency exchanges announced on Sunday that they would curb a type of high-risk trading that has been blamed in part for sharp fluctuations in the value of Bitcoin and the casino-like atmosphere on such platforms globally.The first move came from the exchange, FTX, which said it would reduce the size of the bets investors can make by lowering the amount of leverage it offers to 20 times from 101 times. Leverage multiplies the traders’ chance for not only profit, but also loss....About 14 hours later, Changpeng Zhao [CZ], the founder of Binance, the world’s largest cryptocurrency exchange, echoed the move by FTX, announcing that his company had already started to limit leverage to 20 times for new users and it would soon expand this limit to other existing clients. Early the next day, Tom Schoenberg, Matt Robinson, and Zeke Faux reported for Bloomberg that Tether Executives Said to Face Criminal Probe Into Bank Fraud: U.S. probe into Tether is homing in on whether executives behind the digital token committed bank fraud, a potential criminal case that would have broad implications for the cryptocurrency market.Tether’s pivotal role in the crypto ecosystem is now well known because the token is widely used to trade Bitcoin. But the Justice Department investigation is focused on conduct that occurred years ago, when Tether was in its more nascent stages. Specifically, federal prosecutors are scrutinizing whether Tether concealed from banks that transactions were linked to crypto, said three people with direct knowledge of the matter who asked not to be named because the probe is confidential.Federal prosecutors have been circling Tether since at least 2018. In recent months, they sent letters to individuals alerting them that they’re targets of the investigation, one of the people said. SourceOnce again, David Gerard pointed out the obvious market manipulation:This week’s “number go up” happened several hours before the report broke — likely when the Bloomberg reporter contacted Tether for comment. BTC/USD futures on Binance spiked to $48,000, and the BTC/USD price on Coinbase spiked at $40,000 shortly after.Here’s the one-minute candles on Coinbase BTC/USD around 01:00 UTC (2am BST on this chart) on 26 July — the price went up $4,000 in three minutes. You’ve never seen something this majestically organic And so did Amy Castor in The DOJ’s criminal probe into Tether — What we know:Last night, before the news broke, bitcoin was pumping like crazy. The price climbed nearly 17%, topping $40,000. On Coinbase, the price of BTC/USD went up $4,000 in three minutes, a bit after 01:00 UTC. After a user placed a large number of buy orders for bitcoin perpetual futures denominated in tethers (USDT) on Binance — an unregulated exchange struggling with its own banking issues — The BTC/USDT perpetual contract hit a high of $48,168 at around 01:00 UTC on the exchange.Bitcoin pumps are a good way to get everyone to ignore the impact of bad news and focus on number go up. “Hey, this isn’t so bad. Bitcoin is going up in price. I’m rich!” SourceAs shown in the graph, the perpetual futures market is at least an order of magnitude larger than the spot market upon which it is based. and as we saw for example on December 16th and July 26th, the spot market is heavily manipulated. Pump-and-dump schemes in the physical market are very profitable, and connecting them to the casino in the futures market with its insane leverage can juice profitability enormously.Tether and BinanceFais Kahn's second part, Bitcoin's end: Tether, Binance and the white swans that could bring it all down, explores the mutual dependency between Tether and Binance:There are $62B tokens for USDT in circulation, much of which exists to fuel the massive casino that is the perpetual futures market on Binance. These complex derivatives markets, which are illegal to trade in the US, run in the tens of billions and help drive up the price of Bitcoin by generating the basis trade.The "basis trade":involves buying a commodity at spot (taking a long position) and simultaneously establishing a short position through derivatives like options or futures contracts Kahn continues:For Binance to allow traders to make such crazy bets, it needs collateral to make sure if traders get wiped out, Binance doesn’t go bankrupt. That collateral is now an eye-popping $17B, having grown from $3B in February and $10B in May:But for that market to work, Binance needs USDT. And getting fresh USDT is a problem now that the exchange, which has always been known for its relaxed approach to following the laws, is under heavy scrutiny from the US Department of Justice and IRS: so much so that their only US dollar provider, Silvergate Bank, recently terminated their relationship, suggesting major concerns about the legality of some of Binance’s activities. This means users can no longer transfer US dollars from their bank to Binance, which were likely often used to fund purchases of USDT.Since that shutdown, the linkages between Binance, USDT, and the basis trade are now clearer than ever. In the last month, the issuance of USDT has completely stopped:Likewise, futures trading has fallen significantly. This confirms that most of the USDT demand likely came from leveraged traders who needed more and more chips for the casino. Meanwhile, the basis trade has completely disappeared at the same time.Which is the chicken and which is the egg? Did the massive losses in Bitcoin kill all the craziest players and end the free money bonanza, or did Binance’s banking troubles choke off the supply of dollars, ending the game for everyone? Either way, the link between futures, USDT, and the funds flooding the crypto world chasing free money appears to be broken for now. This is a problem for Binance:Right now Tether is Binance’s $17B problem. At this point, Binance is holding so much Tether the exchange is far more dependent on USDT’s peg staying stable than it is on any of its banking relationships. If that peg were to break, Binance would likely see capital flight on a level that would wreak untold havoc in the crypto markets...Regulators have been increasing the pace of their enforcements. In other words, they are getting pissed, and the BitMex founders going to jail is a good example of what might await.Binance has been doing all it can to avoid scrutiny, and you have to award points for creativity. The exchange was based in Malta, until Malta decided Binance had “no license” to operate there, and that Malta did not have jurisdiction to regulate them. As a result, CZ began to claim that Binance “doesn’t have” a headquarters. Wonder why? Perhaps to avoid falling under anyone’s direct jurisdiction, or to avoid a paper trail?CZ went on to only reply that he is based in “Asia.” Given what China did to Jack Ma recently, we can empathize with a desire to stay hidden, particularly when unregulated exchanges are a key rail for evading China’s strict capital controls. Any surprise that the CFO quit last month?But it is also a problem for Tether:Here’s what could trigger a cascade that could bring the exchange down and much of crypto with it: the DOJ and IRS crack down on Binance, either by filing charges against CZ or pushing Biden and Congress to give them the death penalty: full on sanctions. This would lock them out of the global financial system, cause withdrawals to skyrocket, and eventually drive them to redeem that $17B of USDT they are sitting on.And what will happen to Tether if they need to suddenly sell or redeem those billions?We have no way of knowing. Even if fully collateralized, Tether would need to sell billions in commercial paper on short notice. And in the worst case, the peg would break, wreaking absolute havoc and crushing crypto prices. Alternatively It’s possible that regulators will move as slow as they have been all along - with one country at a time unplugging Binance from its banking system until the exchange eventually shrinks down to be less of a systemic risk than it is. That's my guess — it will become increasingly difficult either to get USD or cryptocurrency out of Binance's clutches, or to send them fiat, as banks around the world realize that doing business with Binance is going to get them in trouble with their regulators. Once customers realize that Binance has become a "roach motel" for funds, and that about 25% of USDT is locked up there, things could get quite dynamic.Kahn concludes:Everything around Binance and Tether is murky, even as these entities two dominate the crypto world. Tether redemptions are accelerating, and Binance is in trouble, but why some of these things are happening is guesswork. And what happens if something happens to one of those two? We’re entering some uncharted territory. But if things get weird, don’t say no one saw it coming. Policy ResponsesGorton and Zhang argue that the modern equivalent of the "free banking" era is fraught with too many risks to tolerate. David Gerard provides an overview of the era in Stablecoins through history — Michigan Bank Commissioners report, 1839:The wildcat banking era, more politely called the “free banking era,” ran from 1837 to 1863. Banks at this time were free of federal regulation — they could launch just under state regulation.Under the gold standard in operation at the time, these state banks could issue notes, backed by specie — gold or silver — held in reserve. The quality of these reserves could be a matter of some dispute.The wildcat banks didn’t work out so well. The National Bank Act was passed in 1863, establishing the United States National Banking System and the Office of the Comptroller of the Currency — and taking away the power of state banks to issue paper notes. Gerard's account draws from a report of Michigan's state banking commissioners, Documents Accompanying the Journal of the House of Representatives of the State of Michigan, pp. 226–258, which makes clear that Tether's lack of transparency as to its reserves isn't original. Banks were supposed to hold "specie" (money in the form of coin) as backing but:The banking system at the time featured barrels of gold that were carried to other banks, just ahead of the inspectors For example, the commissioners reported that:The Farmers’ and Mechanics’ bank of Pontiac, presented a more favorable exhibit in point of solvency, but the undersigned having satisfactorily informed himself that a large proportion of the specie exhibited to the commissioners, at a previous examination, as the bona fide property of the bank, under the oath of the cashier, had been borrowed for the purpose of exhibition and deception; that the sum of ten thousand dollars which had been issued for “exchange purposes,” had not been entered on the books of the bank, reckoned among its circulation, or explained to the commissioners. Gorton and Zhang summarize the policy choices thus:Based on historical lessons, the government has a couple of options: (1) transform stablecoins into the equivalent of public money by (a) requiring stablecoins to be issued through FDIC- insured banks or (b) requiring stablecoins to be backed one-for-one with Treasuries or reserves at the central bank; or (2) introduce a central bank digital currency and tax private stablecoins out of existence. Their suggestions for how to implement the first option include:the interpretation of Section 21 of the Glass-Steagall Act, under which "it is unlawful for a non-bank entity to engage in deposit-taking"the interpretation of Title VIII of the Dodd-Frank Act, under which the Financial Stability Oversight Council could "designate stablecoin issuance as a systemic payment activity". This "would give the Federal Reserve the authority to regulate the activity of stablecoin issuance by any financial institution."Congress could pass legislation that requires stablecoin issuers to become FDIC-insured banks or to run their business out of FDIC-insured banks. As a result, stablecoin issuers would be subject to regulations and supervisory activities that come along with being an FDIC-insured bank.Alternatively, the second option would involve:Congress could require the Federal Reserve to issue a central bank digital currency as a substitute to privately produced digital money like stablecoins...The question then becomes whether policymakers would want to have central bank digital currencies coexist with stablecoins or to have central bank digital currencies be the only form of money in circulation. As discussed previously, Congress has the legal authority to create a fiat currency and to tax competitors of that uniform national currency out of existence. They regard the key attribute of an instrument that acts as money to be that it is accepted at face value "No Questions Asked" (NQA). Thus, based on history they ask:In other words, should the sovereign have a monopoly on money issuance? As shown by revealed preference in the table below, the answer is yes. The provision of NQA money is a public good, which only the government can supply. Digital Library Federation: Building a Transatlantic Digital Scholarship Skills Exchange for Research Libraries: Moving Forward There may be an ocean between the US and the UK, but in the age of Zoom, collaboration can transcend geographical boundaries. Research Libraries UK’s Digital Scholarship Network (DSN) and CLIR’s Digital Library Federation’s Data and Digital Scholarship (DDS) working group are continuing to foster a partnership aimed at building transatlantic collaborations and connections. The two groups have had a series of conversations, and we have now hosted two joint meetings. At an April 2021 event, we divided into small groups to share ideas about the potential collaborative future of our two groups.  Then we conducted a follow-up Expressions of Interest survey in April-May 2021.  In response to the April event and the Expressions of Interest survey, we hosted a second event on 14 July 2021, at which we launched a beta Skills Exchange Directory, and we offered a series of tailored “skills conversations.”  This post provides more detail on the July event and plans for next steps in our transatlantic collaboration. July Event One of the main takeaways from the first event in April was that participants appreciated being able to meet colleagues both from the UK and US so we were keen to facilitate this again and started the July event with 121 or small group “meet and greets”. Participants enjoyed the serendipity of these meetings and individual connections have already been made.  The second part of the event was more structured and aimed to build on the information we had gathered from the Expressions of Interest survey. From this survey we were able to identify key areas of digital scholarship skills that colleagues were most keen to develop and could match these with colleagues who were willing to share their expertise in these areas by facilitating curated conversations. The areas identified were: Artificial Intelligence and machine learning  Tools for digital scholarship  Planning and managing a digital scholarship centre  Assessment and Metrics The success of these curated conversations was dependent on the experts being willing to share their knowledge and we are grateful to Carol Chiodo (Harvard Library), Alexandra Sarkozy (Wayne State University),  Sarah Melton (Boston College), Kirsty Lingstadt (University of Edinburgh), Eleonora Gandolfi (University of Southampton), Gavin Boyce (University of Sheffield) and Matt Philips (University of Southampton) for being so willing to participate and share their experiences so openly.  In these breakout sessions the experts talked about their services and how they developed skills for 10 mins each and then participants who had signed up for the session were able to ask follow up questions. DSN and DDS partners acted as moderators in each breakout session and we were impressed with how informative and interactive the sessions were. If you weren’t able to make it to the July event, each session documented the conversation in a shared notes document. These interactive sessions are core for the success of a skills exchange but the finale of this event was the launch of the Skills Directory – a dynamic resource through which colleagues from both groups can share their skills and expertise with one another.  Transatlantic Skills Directory Participants were introduced to the directory (designed by Stephanie Jesper and Susan Halfpenny at University of York) and shown how to search (via the Google sheets filter function) to identify colleagues with varying levels of expertise across 18 skills areas relating to digital scholarship activities within research libraries. The directory includes the names and contact details of colleagues who are willing to share their skills and expertise around a skills area, and the means through which they are willing to do so (e.g. one-on-one conversation, contributing to a training session etc). Due to the potential demand on colleagues the directory is only available to DSN and DDS members but a recording of the introduction to the directory from the session is available here.   The success of the Directory depends on colleagues signing up to share their skills and after the demonstration participants were given time to register their own skills – the group were impressed by the response. As we write this, 30 colleagues have registered in the directory, offering more than 175 skills between them. However, we still need more and encourage members of RLUK DSN and DLF DDS to register their skills. We need your expertise to make this a success! We also encourage members to make good use of the Directory to learn new skills or move forward with digital scholarship services and would be keen to hear of any contacts made via the directory. Next Steps These events and tools have helped us learn more about our colleagues on both sides of the Atlantic, and RLUK DSN and DLF DDS look forward to continuing this partnership. Please check out the shared meeting notes, resources, and other materials available on our Open Science Framework site. We plan to host more joint events to support networking and idea-generation, and we will continue to expand the directory with more colleagues who are interested in exchanging skills. Thanks to everyone involved in arranging these events and the directory and of course to everyone who participates in the skills exchange! Colleagues leading this work Beth Clark, Associate Director, Digital Scholarship & Innovation, London School of Economics, and RLUK DSN member. Sara Mannheimer, Associate Professor, Data Librarian, Montana State University, and DLF DDS co-convener. Jason Clark, Professor, Lead for Research Informatics, Montana State University, and DLF DDS co-convener. Susan Halfpenny, Head of Digital Scholarship & Innovation, University of York, and RLUK DSN member. Matt Greenhall, Deputy Executive Director, RLUK Thanks go to Gayle Schechter (Program Associate, CLIR/DLF), Louisa M. Kwasigroch (Director, Outreach and Engagement at CLIR and Interim DLF Senior Program Officer), Kirsty Lingstadt (Deputy Director, University of Edinburgh and RLUK DSN co-convener), Eleonora Gandolfi (Head of Digital Scholarship and Innovation, University of Southampton and RLUK DSN co-convener), Stephanie Jesper (Teaching & Learning Advisor, University of York), and Melanie Cheung (RLUK Executive Assistant). The post Building a Transatlantic Digital Scholarship Skills Exchange for Research Libraries: Moving Forward appeared first on DLF. Open Knowledge Foundation: Register your Interest: Open Knowledge Justice Programme Community Meetups What’s this about? The Open Knowledge Justice Programme is kicking off a series of free, monthly community meetups to talk about Public Impact Algorithms. Register here. Who is this for? Do you want to learn more about Public Impact Algorithms? Would you like to know how to spot one, and how they might affect the clients you represent? Do you work in government, academia, policy-making or civil society – and are interested in learning how to deploy a Public Impact Algorithm fairly? Tell me more Whether you’re a new to tech or a seasoned pro, join us once a month to share your experiences, listen to our guest speakers and ask our data expert questions on this fast changing issue. = = = = = When? Lunch time every second Thursday of the month – starting September 9th 2021. How? Register your interest here = = = = = More info: www.thejusticeprogramme.org/community Journal of Web Librarianship: #PowerInNumbers: How Digital Libraries Use Collaborative Social Media Campaigns to Promote Collections . OCLC Dev Network: Planned maintenance for the Classify Application, 11 August OCLC will be performing maintenance on the experimental Classify API on 11 August. Digital Library Federation: DLF Digest: August 2021 A monthly round-up of news, upcoming working group meetings and events, and CLIR program updates from the Digital Library Federation.  This month’s news: The DLF Digital Library Pedagogy group invites all interested digital pedagogy practitioners to contribute to a literacy- and competency-centered #DLFteach Toolkit, an online, open resource focused on lesson plans and concrete instructional strategies. For additional information, see the CFP. If you are interested in contributing a lesson to the toolkit and/or being a peer reviewer, please complete the intent to contribute form. Proposals are due by September 1, 2021. Questions can be sent to dlfteach.toolkit3@gmail.com. On August 11, all Twitter updates about CLIR’s Recordings at Risk and Digitizing Hidden Collections grants and grantees will move to one account, @CLIRgrants. Make sure to follow the @CLIRHC Twitter account so you’ll be ready once it transitions to @CLIRgrants! Registration for the 2021 DLF Forum and affiliated events opens soon. Subscribe to the DLF Forum newsletter for the latest updates on this fall’s virtual events.  This month’s DLF group events: DLF Digital Library Pedagogy group – #DLFteach Twitter Chat Tuesday, August 17, 8pm ET/5pm PT; participate on Twitter using the hashtag #DLFteach Join the DLF Digital Library Pedagogy group for this month’s Twitter chat on building stronger community engagement for open source. Twitter chat details, instructions, and archives of past #DLFteach chats are available on the DLF wiki. This month’s open DLF group meetings: For the most up-to-date schedule of DLF group meetings and events (plus NDSA meetings, conferences, and more), make sure to bookmark the DLF Community Calendar. Can’t find meeting call-in information? Email us at info@diglib.org.  DLF-AIG Metadata Assessment working group: Thursdays, August 5 & 19, 1:15pm ET/10:15am PT DLF-AIG Cultural Assessment working group: Monday, August 9, 1pm ET/10am PT DLF-AIG Cost Assessment working group: Monday, August 9, 3pm ET/12pm PT DLF Digital Accessibility working group: Wednesday, August 11, 2pm ET/11am PT DLF Digital Accessibility working group-Education & Advocacy subgroup: Monday, August 16, 1pm ET/10am PT DLF Committee for Equity & Inclusion: Thursday, August 19, 3pm ET/12pm PT DLF Museums Cohort: Wednesday, August 25, 2pm ET/11am PT DLF-AIG User Experience working group: Friday, August 27, 2pm ET/11am PT DLF groups are open to ALL, regardless of whether or not you’re affiliated with a DLF member institution. Learn more about our working groups and how to get involved on the DLF website. Interested in starting a new working group or reviving an older one? Need to schedule an upcoming working group call? Check out the DLF Organizer’s Toolkit to learn more about how Team DLF supports our working groups, and send us a message at info@diglib.org to let us know how we can help.  The post DLF Digest: August 2021 appeared first on DLF. Digital Library Federation: Arabic Translations Available of the 2019 Levels of Digital Preservation The NDSA is pleased to announce that the 2019 Levels of Preservation documents have been translated into Arabic by our colleagues from Thesaurus Islamicus Foundation’s Qirab project (http://thesaurus-islamicus.org/index.htm). Translations for the Levels of Digital Preservation Matrix and Implementation Guide were completed. Links to these documents are found on the 2019 Levels of Digital Preservation OSF site (https://osf.io/qgz98/). If you would be interested in translating the Levels of Digital Preservation V2.0 into another language please contact us at ndsa.digipres@gmail.com. الترجمات العربيَّة لمستويات الحفظ لعام 2019 والوثائق المرتبطة بها يسرُّ الاتحاد الوطني للإشراف الرقمي أن يُعلنَ عن صدور ترجمة باللغة العربيَّة لوثائق مستويات الحفظ الرقمي لعام 2019، نفَّذها زملاؤنا من مشروع «قِرَاب» التابع لجمعية المكنز الإسلامي. (http://thesaurus-islamicus.org)   وقد اكتملت التَّرجمات الخاصة بمصفوفة ودليل تنفيذ مستويات الحفظ الرقمي. علمًا أنه يمكن الوصولُ إلى تلك الوثائق من خلال مستويات الحفظ الرقمي لعام 2019 في الموقع الإلكتروني الخاص بمؤسسة أوبن ساينس فريم ورك (OSF). (https://osf.io/qgz98) وفي حالة اهتمامكم بترجمة الإصدار الثاني من مستويات الحفظ الرقمي إلى لغةٍ أخرى، يُرجى التواصلُ معنا عبر عنوان البريد الإلكتروني التالي: ndsa.digipres@gmail.com The post Arabic Translations Available of the 2019 Levels of Digital Preservation appeared first on DLF. Lorcan Dempsey: Presentation: Two Metadata Directions ConferenceI was pleased to deliver a presentation at the Eurasian Academic Libraries Conference - 2021, organized by The Nazarbayev University Library and the Association of University Libraries in the Republic of Kazakhstan. Thanks to April Manabat of Nazarbayev University for the invitation and support. I was asked to talk about metadata and to mention OCLC developments. The conference topic was: Contemporary Trends in Information Organization in the Academic Library Environment.Further information here:Nazarbayev University LibGuides: Eurasian Academic Libraries Conference - 2021: HomeNazarbayev University LibGuides: Eurasian Academic Libraries Conference - 2021: HomeNazarbayev University LibGuides at Nazarbayev UniversityTopicI spoke about two trends in metadata developments: entification and pluralization. Each of these is important and I provided an example under each head of a related initiative at OCLC. I discuss these trends in more detail in a recent blog entry, which recapitulated and extends the material presented at the conference:Two metadata directions in librariesMetadata practice is evolving. I discuss two important trends here: entification and pluralization.LorcanDempsey.netLorcanAdditional materialsThe slides as presented are here:kazak.pptxThis is a presentation about trends in metadata, focusing on two important issues. The first is entification, moving from strings to things. The second is pluralization, as we seek to better represent the diversity of perspectives, experiences and memories. It discusses an OCLC initiative associated…figshareThe conference organizers have made a video of the sessions available. Here is the video for day two, which should begin as I begin speaking. Move back to see more of the presentations. As we prepared the video, I did reflect on the future of conferences and conference-going. Clearly, much to work through here, and we are certainly seeing new and engaging online and hybrid experiences. In writing the accompanying blog entry, I finished with this observation:The Pandemic is affecting how we think about work travel and the design of events, although in as yet unclear ways. One pandemic effect, certainly, has been the ability to think about both audiences and speakers differently. It is unlikely that I would have attended this conference had it been face to face, however, I readily agreed to be an online participant. // Two Metadata Directions Lucidworks: How to Increase Ecommerce Conversion Rates with Signals Signals are powerful indicators of what customers want. Here's how brands can use signals with machine learning models to improve conversion. The post How to Increase Ecommerce Conversion Rates with Signals appeared first on Lucidworks. David Rosenthal: Economics Of Evil Revisited Eight years ago I wrote Economics of Evil about the death of Google Reader and Google's habit of leaving its customers users in the lurch. In the comments to the post I started keeping track of accessions to le petit musée des projets Google abandonnés. So far I've recorded at least 33 dead products, an average of more than 4 a year. Two years ago Ron Amadeo wrote about the problem this causes in Google’s constant product shutdowns are damaging its brand:We are 91 days into the year, and so far, Google is racking up an unprecedented body count. If we just take the official shutdown dates that have already occurred in 2019, a Google-branded product, feature, or service has died, on average, about every nine days. Below the fold, some commentary on Amadeo's latest report from the killing fields, in which he detects a little remorse.Belatedly, someone at Google seems to have realized that repeatedly suckering people into using one of your products then cutting them off at the knees, in some cases with one week's notice, can reduce their willingness to use your other products. And they are trying to do something about it, as Amadeo writes in Google Cloud offers a model for fixing Google’s product-killing reputation:A Google division with similar issues is Google Cloud Platform, which asks companies and developers to build a product or service powered by Google's cloud infrastructure. Like the rest of Google, Cloud Platform has a reputation for instability, thanks to quickly deprecating APIs, which require any project hosted on Google's platform to be continuously updated to keep up with the latest changes. Google Cloud wants to address this issue, though, with a new "Enterprise API" designation. What Google means by "Enterprise API" is: Our working principle is that no feature may be removed (or changed in a way that is not backwards compatible) for as long as customers are actively using it. If a deprecation or breaking change is inevitable, then the burden is on us to make the migration as effortless as possible. They then have this caveat:The only exception to this rule is if there are critical security, legal, or intellectual property issues caused by the feature. And go on to explain what should happen: Customers will receive a minimum of one year’s notice of an impending change, during which time the feature will continue to operate without issue. Customers will have access to tools, docs, and other materials to migrate to newer versions with equivalent functionality and performance. We will also work with customers to help them reduce their usage to as close to zero as possible. This sounds good, but does anyone believe if Google encountered "critical security, legal, or intellectual property issues" that meant they needed to break customer applications they'd wait a year before fixing them?Amadeo points out that:Despite being one of the world's largest Internet companies and basically defining what modern cloud infrastructure looks like, Google isn't doing very well in the cloud infrastructure market. Analyst firm Canalys puts Google in a distant third, with 7 percent market share, behind Microsoft Azure (19 percent) and market leader Amazon Web Services (32 percent). Rumor has it (according to a report from The Information) that Google Cloud Platform is facing a 2023 deadline to beat AWS and Microsoft, or it will risk losing funding.The linked story from 2019 actually says: While the company has invested heavily in the business since last year, Google wants its cloud group to outrank those of one or both of its two main rivals by 2023 On Canalys numbers, the "and" target to beat (AWS plus Azure) has happy customers forming 51% of the market. So there is 42% of the market up for grabs. If Google added every single one of them to its 7% they still wouldn't beat a target of "both". Adding six times their customer base in 2 years isn't a realistic target.Even the "or" target of Azure is unrealistic. Since 2019 Google's market share has been static while Azure's has been growing slowly. Catching up in the 2 years remaining would involve adding 170% of Google's current market share. So le petit musée better be planning to enlarge its display space to make room for a really big new exhibit in 2024. Digital Library Federation: Call for Nominations to the NDSA Coordinating Committee NDSA will be electing three members to its Coordinating Committee (CC) this year, with terms starting in January 2022. CC members serve a three year term, participate in a monthly call, and meet at the annual Digital Preservation Conference. The Coordinating Committee provides strategic leadership to the organization in coordination with group co-chairs. NDSA is a diverse community with a critical mission, and we seek candidates to join the CC that bring a variety of cultures and orientations, skills, perspectives and experiences, to bear on leadership initiatives. Working on the CC is an opportunity to contribute your leadership for the community as a whole, while collaborating with a wonderful group of dynamic and motivated professionals.  If you are interested in joining the NDSA Coordinating Committee (CC) or want to nominate another member, please complete the nomination form by 11:59pm EDT Friday, August 13, 2021, which asks for the name, e-mail address, brief bio/candidate statement (nominee-approved), and NDSA-affiliated institution of the nominee. We particularly encourage and welcome nominations of people from underrepresented groups and sectors.  As members of the NDSA, we join together to form a consortium of more than 260 partnering organizations, including businesses, government agencies, nonprofit organizations, professional associations and universities, all engaged in the long-term preservation of digital information. Committed to preserving access to our national digital heritage, we each offer our diverse skills, perspectives, experiences, cultures and orientations to achieve what we could not do alone.  The CC is dedicated to ensuring a strategic direction for NDSA, to the advancement of NDSA activities to achieve community goals, and to further communication among digital preservation professionals and NDSA member organizations. The CC is responsible for reviewing and approving NDSA membership applications and publications; updating eligibility standards for membership in the alliance, and other strategic documents; engaging with stakeholders in the community; and working to enroll new members committed to our core mission. More information about the duties and responsibilities of CC members can be found at the NDSA’s Leadership Page. We hope you will give this opportunity serious consideration and we value your continued contributions and leadership in our community. Any questions can be directed to ndsadigipres@gmail.com.   Thank you, Nathan Tallman, Vice Chair On behalf of the NDSA Coordinating Committee The post Call for Nominations to the NDSA Coordinating Committee appeared first on DLF. DuraSpace News: Fedora Migration Paths and Tools Project Update: July 2021 This is the latest in a series of monthly updates on the Fedora Migration Paths and Tools project – please see the previous post for a summary of the work completed up to that point. This project has been generously funded by the IMLS. We completed some final performance tests and optimizations for the University of Virginia pilot. Both the migration to their AWS server and the Fedora 6.0 indexing operation were much slower than anticipated, so the project team tested a number of optimizations, including: Adding more processing threads Increasing the size of the server instance  Using a separate and larger database server  Using locally attached flash storage Fortunately, these improvements made a big difference; for example, ingest speed was increased from 6.8 resources per second to 45.6 resources per second. In general, this means that institutions with specific performance targets can use a combination of parallel processing and increased computational resources. Feedback from this pilot has been incorporated into the migration guide, updates to the migration-utils to improve performance, updates to the aws-deployer tool to provide additional options, and improvements to the migration-validator to handle errors. The Whitman College team has begun their production migration using Islandora Workbench. Initial benchmarking has shown that running Workbench from the production server rather than locally on a laptop achieves much better performance, so this is the recommended approach. The team is working collection-by-collection using CSV files and a tracking spreadsheet to keep track of each collection as it is ingested and ready to be tested. They have also developed a Quality Control checklist to make sure everything is working as intended – we anticipate doing detailed checks on the first few collections and spot checks for subsequent collections. As we near the end of the pilot project phase of the grant work we are focused on documentation for the migration toolkit. We plan to complete a draft of this documentation over the summer, after which this draft will be shared with the broader community for feedback. We will organize meetings in the Fall to provide opportunities for community members to provide additional feedback on the toolkit and make suggestions for improvements. The post Fedora Migration Paths and Tools Project Update: July 2021 appeared first on Duraspace.org. HangingTogether: How well does EAD tag usage support finding aid discovery? In November, we shared information with you about the Building a National Finding Aid Network project (NAFAN). This is a two-year research and demonstration project to build the foundation for a (US) national archival finding aid network. OCLC is engaged as a partner in the project, leading qualitative and quantitative research efforts. This post will gives some details on just one of those research strands, evaluation of finding aid data quality. In considering building a nationwide aggregation of finding aids, looking at the potential raw materials that will make up that resource helps us to both scope the network’s functionality to the finding aid data​ and to lay the groundwork for data remediation and expanded network features.​ We have two main research questions when approaching the finding aid data quality: What is the structure and extent of consistency across finding aid data in current aggregations?Can that data support the needs to be identified in the user research phase of the study? If so, how? If not, what are the gaps? About the research aggregation Twelve NAFAN partners made their finding aids available to the project for quantitative analysis, producing a total of over 145 thousand documents. The finding aids were provided in the Encoded Archival Description (EAD) format. EAD is an XML-based standard for describing collections of archival materials.  As a warning to the reader: this post delves deeply into EAD elements and attributes and assumes at least a passing knowledge of the encoding standard. For those wishing to learn more about the definitions and structure, we recommend the official EAD website or the less official but highly readable and helpful EADiva site.” A treemap visualization of finding aid sources in the NAFAN research aggregation. This treemap visualizes the relative proportion of the finding aid aggregation from the partners: Archival Resources in WisconsinArchives WestArizona Archives Online (AAO)Black Metropolis Research Consortium (BMRC)Chicago Collections ConsortiumConnecticut’s Archives Online (CAO)Empire Archival Discovery Cooperative (EmpireADC)Online Archives of California (OAC)Philadelphia Area Archival Research Portal (PAARP)Rhode Island Archives and Manuscripts Online (RIAMCO)Texas Archival Resources Online (TARO)Virginia Heritage ​Though a few of the partners provided much of the content, the aggregation is a very good mix from a wide variety of United States locales and institution types.​ Dimensions for analysis This analysis continues work carried out previously, including a 2013 EAD tag analysis that OCLC worked on with a different aggregation of EAD documents, based on about 120,000 finding aids drawn from OCLC’s ArchiveGrid discovery system. ​You can check out that previous study, “Thresholds for Discovery: EAD Tag Analysis in ArchiveGrid, and Implications for Discovery Systems” published in code4lib Journal issue 22. OCLC’s 2013 analysis looked at EAD tag and attribute usage from a discovery perspective. For that study, we identified five high-level features that were often present in archival discovery systems.  ​ Search: all discovery systems have a keyword search function; many also include the ability to search by a particular field or element.​Browse: many discovery systems include the ability to browse finding aids by title, subject, dates, or other facets.​Results display: once a user has done a search, the results display will return portions of the finding aid to help with further evaluation.​Sort: once a user has done a search, they may have the option to reorder the results.​Facet: once a user has done a search, they may have the option to narrow the results to only include results that fall within certain facets.​ The analysis used that framework of high-level discovery features to select EAD elements and attributes that, if present, could be accessed, indexed, and displayed.​ This is the categorization of EAD elements and attributes that the study found to be relevant for supporting discovery system features.  ​ Dates​: unitdate​Extent data​: extent​Collection title sources​: unittitle, titleproper/@type=filing​Content tags in dsc​: corpname, famname, function, genreform, geogname, name, occupation, persname, subject​Content tags in origination​: corpname, famname, name, persname​Content tags in controlaccess​: corpname, famname, function, geogname, name, occupation, persname, subject​Material type​: controlaccess/genreform​Repository​: repository​Notes​: abstract, bioghist, scopecontent​ For example, dates could potentially be utilized as search terms, or leveraged for for browsing or sorting. They may also be important for disambiguating similarly named collections in displays.  ​Similarly, material types, represented by form and genre terms, could be important for narrowing a large result using a facet.​ (Thank you to eadiva for providing the excellent tag library that is linked to from the EAD elements names above.) The question then was, how often are these key elements and attributes used?​ Defining Thresholds for Discovery​ A table showing the thresholds of EAD tag usage for supporting discovery. We should preface this by saying that it is difficult to predefine thresholds for the level of usage of an element at which it becomes more or less useful for discovery. ​Is an element that is used 95% of the time still useful but one that is used 94% not?​ OCLC’s 2013 study developed these thresholds after evaluating the EAD aggregation.​ The absence of an element does not directly lead to a breakdown in a discovery system. It is more like a gradual decay of its effectiveness. ​ Although we used these levels as a reference point in the 2013 study, we recognized that correlating usage with discovery is an artificial construct. ​ A table comparing EAD tag usage in 2021 and 2013. The above figure shows a comparison of the usage thresholds from the 2013 study (right), compared with the same tag analysis applied to the NAFAN corpus in 2021 (left).​ You will notice that there are some elements that we have added for analysis in the 2021 study, thanks to input from the expanded project team and the advisory board which is reviewing / providing input into work. This input has helped to breathe new life into old research.​ ​The findings of the 2013 study were decidedly mixed. Some important elements were at the high or complete thresholds.  But many elements that are necessary for discovery interfaces were at medium or low use. Though the NAFAN EAD aggregation is a different corpus of data provided by different contributing institutions at a different time, the EAD tag analysis for it hasn’t changed the picture very much.  ​ A few elements have moved from the high threshold to complete, and a few from medium to high. And we found that there was mostly low-level use of content tags in origination and control access.​ Apart from that, the 2013 study’s appraisal of how well EAD supports the typical features of discovery systems could be considered mostly unchanged.​ This may be due in part to the relatively static nature of EAD finding aids. Once written and published, some documents may not receive further updates and improvements. It is not uncommon to find EAD documents in this aggregation that were published several years ago and have not been updated since. Looking back on the conclusions of the 2013 study suggests that its cautionary forecast about underutilization of EAD to support discovery has proven to be accurate, while the study’s vision of the promise and potential for improving EAD encoding has yet to be fulfilled.​ If the archival community continues on its current path, then the potential of the EAD format to support researchers or the public in discovery of material will remain underutilized. Minimally, collection descriptions that are below the thresholds for discovery will hinder their discovery efforts and maximally will remain hidden from view.​Perhaps with emerging evidence about the corpus of EAD, continued discussion of practice, recognition of a need for greater functionality, and shared tools both to create new EAD documents and improve existing encoding, we can look forward to further increasing the effectiveness and efficiency of EAD encoding and develop a practice of EAD encoding that pushes collection descriptions across the threshold of discovery.Thresholds for Discovery: EAD Tag Analysis in ArchiveGrid, and Implications for Discovery Systems More research opportunities Though replicating the 2013 EAD Tag Analysis was an important step to confirm what we previously understood about the content and character of EAD finding aids, it only scratched the surface of what’s left to learn.​ While OCLC’s qualitative research is still being carried out and its findings won’t be available until later in the project, we can pursue other quantitative research right now to learn more about the NAFAN finding aid aggregation.​ Here are some of the areas that we’re investigating:​ What is the linking potential of the NAFAN EAD finding aids?​What is the completeness and consistency of the description of collections’ physical characteristics and genre?​Are content element values associated with controlled vocabularies, or can they be?​Is institutional contact information in EAD finding aids consistent and reliable? ​How do EAD finding aids inform researchers about access to, use of, and reuse of materials in the described collections?​ There are many possible avenues for research, but we want to be truly informed by the focus groups and researcher interviews before investing additional effort.​ The first area of investigation noted here about finding aid links to digital content correlates with early findings from OCLC’s NAFAN pop-up survey which show that, for many users, only digitized materials would be of interest.   ​ Investigating the linking potential of the aggregated finding aids could help answer several questions, including:​ What is the average number of external links per finding aid?​What EAD elements and attributes are most frequently used for external links?​What types of digital objects are linked?​How many relative URLs are present, that rely on the finding aid to be accessed within its local context?​What percentage of external links still resolve?​ OCLC will be investigating these areas and publishing findings over the coming months.​ Please get in touch with us if you’d like to discuss this work in more detail.​ The post How well does EAD tag usage support finding aid discovery? appeared first on Hanging Together. Open Knowledge Foundation: Applications for the CoAct Open Calls on Gender Equality (July 1st, 2021- September 30th, 2021) are open! CoAct is launching a call for proposals, inviting civil society initiatives to apply for our cascading grants with max. 20.000 Euro to conduct Citizen Social Science research on the topic of Gender Equality. A maximum of four (4) applicants will be selected across three (3) different open calls. Applications from a broad range of backgrounds are welcome, including feminist, LGTBQ+, none-binary and critical masculinity perspectives. Eligible organisations can apply until September 30th, 2021, 11:59 PM GMT. All information for submitting applications is available here: https://coactproject.eu/opencalls/ If selected, CoAct will support your work by providing funding for your project (10 months max), alongside dedicated activities, resources and tools providing a research mentoring program for your team. In collaborative workshops you will be supported to co-design and explore available tools, working together with the CoAct team to achieve your goals. connecting you to a community of people and initiatives, tackling similar challenges and contributing to common aims. You will have the opportunity to discuss your projects with the other grantees and, moreover, will be invited to join CoAct´s broader Citizen Social Science network. You should apply if you: are an ongoing Citizen Social Science project looking for support, financial and otherwise, to grow and become sustainable; are a community interested in co-designing research to generate new knowledge about gender equality topics, broadly defined; are a not-for-profit organization focusing on community building, increasing the visibility of specific communities, increasing civic participation, and being interested in exploring the use of Citizen Social Science in your work. Read more about the Open Calls here: https://coactproject.eu/opencalls/ Ed Summers: AltAir Звезда Альтаир We use Airtable quite a bit at $work for building static websites. It provides a very well designed no-code or low-code environment for creating and maintaining databases. It has an easy to use, beautifully documented, API which makes it simple to use your data in many different settings, and also to update the data programmatically, if that’s needed. Airtable is a product, which means there is polished documentation, videos, user support, and helpful people keeping the lights on. But Airtable also have a fiendishly inventive marketing and sales department who are quite artful at designing their pricing scheme with the features that will get you in the door, and the features that will act as pressure points to drive you to start paying them money … and then more money. Of course, it’s important to pay for services you use on the web…it helps sustain them, which is good for everyone. But tying a fundamental part of your infrastructure to the whims of a company trying to maximize its profits sometimes has its downsides, which normally manifest over time. Wouldn’t it be nice to be able to pay for a no-code database service like Airtable that had more of a platform-cooperative mindset, where the thing being sustained was the software, the hardware and an open participatory organization for managing them? I think this approach has real value, especially in academia and other non-profit and activist organizations, where the focus is not endless growth and profits. I’ve run across a couple open source alternatives to Airtable and thought I would just quickly note them down here for future self in case they are ever useful. Caveat lector: I haven’t actually tried either of them yet, so these are just general observations after quickly looking at their websites, documentation and their code repositories. nocodb nocodb is a TypeScript/Vue web application that has been designed to provide an interface for an existing database such as MySQL, PostgreSQL, sqlite, SQLServer, . I suspect you can also use it to create new databases too, but the fact that it can be used with multiple database backends distinguishes it from the next example. The idea is that you will deploy nocodb on your own infrastructure (using Docker or installing it into a NodeJS environment). They also provide a one click Heroku installer. It has token based REST and GraphQL APIs for integration with other applications. All the code is covered by a GNU AGPL 3 license. It seems like nocodb is tailored for gradual introduction into an already existing database ecosystem, which is good for many people. baserow baserow is a Python/Django + Vue + PostgreSQL application that provides Airtable like functionality in a complete application stack. So unlike nocodb, baserow seems to want to manage the database entirely. While this might seem like a limitation at first I think it’s probably a good thing, since PostgreSQL is arguably the best open source relational database out there in terms of features, support, extensibility and scalability. The fact that nocodb supports so many database backends makes me worry that it might not take full advantage of each, and it may be more difficult to scale. Perhaps the nocodb folks see administering and tuning the database as an orthogonal problem to the one they are solving. Having an application that uses one open source database, and uses it well seems like a plus. But that assumes that there are easy ways to import existing data. While the majority of the baserow code is open source with an MIT Expat license, they do have some code that is designated as Premium with a separate Baserow Premium Edition License that requires you to get a key to deploy. It’s interesting that the premium code appears to be open in the GitLab, and that they are relying on people to do right by purchasing a key if they use it. Or I guess it’s possible that the runtime requires a valid key to be in place for premium features? Their pricing also has a hosted version if you don’t want to deploy the application stack yourself, which is “free for now”, implying that it won’t be in the future, which makes sense. But it’s kind of strange to have to think about the hosting costs and the premium costs together. Having the JSON and XML export be a premium feature seems a bit counter-intuitive, unless it’s meant to be a way to quickly extract money as people leave the platform. governance Anyway these are some general quick notes. If I got anything wrong, or you know of other options in this area of open source, no-code databases please let me know. If I ever get around to trying either of these I’ll be sure to update this post. To return to this earlier idea of a platform-coop that supported these kinds of services I think we don’t see that idea present in either nocodb or baserow. It looks like baserow was started by Bram Wiepjes in the Netherlands in 2020, and that it is being set up as a profitable company. nocodb also appears to be a for profit startup. What would it look like to structure these kinds of software development projects around a co-operative governance? Another option is to deploy and sustain these open source technologies as part of a separate co-op, which is actually how I found out about baserow, through the Co-op Cloud Matrix chat. One downside to this approach is that all the benefits of having an participatory decision making process accrue to the people who are running the infrastructure, and not to the people designing and making the software. Unless of course there is overlap in membership between the co-op and the software development. HangingTogether: Social interoperability: Getting to know all about you Photo by Mihai Surdu on Unsplash Building and sustaining productive cross-campus partnerships in support of the university research enterprise is both necessary and fraught with challenges. Social interoperability – the creation and maintenance of working relationships across individuals and organizational units that promote collaboration, communication, and mutual understanding – is the key to making these partnerships work. There are strategies and tactics for building social interoperability – can you use them to learn more about an important campus stakeholder and potential partner in research support services? This was the challenge we posed to participants in the third and final session of the joint OCLC-LIBER online workshop Building Strategic Relationships to Advance Open Scholarship at your Institution, based on the findings of the recent OCLC Research report Social Interoperability in Research Support: Cross-Campus Partnerships and the University Research Enterprise. This three-part workshop brought together a dynamic group of international participants to examine the challenges of working across the institution, identify tools for cross-unit relationship building, and develop plans for increasing their own social interoperability. In this post, we share insights and perspectives from the workshop’s final session: “Making your plan for developing cross-functional relationships at your institution.” In the first two sessions, we learned why social interoperability is important, and how it can be developed into a skill set we can use in our work. In the third session, we explored how, as library professionals, we can use our social interoperability skill set to reach out to other parts of the campus. The Library as campus partner The session began with a reminder of the important role the Library plays as a stakeholder and partner in the delivery of research support services, with recognized expertise in areas such as metadata, licensing/vendor negotiations, and bibliometrics/research impact. At the same time, the Library often occupies a distinct space in terms of the perspective it brings to its mission, such as a strong preference for “free” and “open” solutions. Given this unique blend of skills and values, the Library is often viewed as a trusted and “agnostic” partner on campus, as we heard from our interviewees for our Social Interoperability report. But we also heard from our interviewees about several sources of frustration encountered when working with the Library. For example, some campus stakeholders described how, in their experience, the Library did not focus enough on the “bottom line”, or moved too slowly in comparison to the needs and workflows of researchers. During the session, we conducted a quick poll of the workshop participants, asking them to put themselves in the role of a different campus stakeholder and consider how that unit would describe the Library in one word or phrase. The results were revealing: while the top responses included “supportive”, “helpful”, “competent”, and “expert”, “slow” was also a frequent choice, and other responses included “friendly but not fully relevant”, “opaque”, and “reactive not proactive”. The important takeaway was that in building cross-campus relationships, library professionals need to take into account, and in some cases, shift, how the Library is perceived by potential collaborative partners, rather than relying on self-perceptions. While unflattering characterizations of the Library may be based on misinformation, unfamiliarity, or differing priorities, they can still impede the development of productive working relationships (see the Social Interoperability report for more on the Library as a campus partner). Learn about your partners Our breakout discussions were motivated by an enormously valuable resource shared by colleagues at Rutgers University-New Brunswick Libraries. Developed as part of a strategic planning initiative, the resource is a questionnaire that the Library used to structure conversations with their stakeholders across campus. The purpose of the questionnaire was to learn about the Library’s stakeholders: their goals, their challenges, their needs. Consequently, almost all of the questions focused on the stakeholder’s priorities and interests; it is only toward the end of the questionnaire that library services are mentioned. This helped elicit the context that the Library needed to align its work with the needs of stakeholders (see the Social Interoperability report for the full questionnaire) . The first discussion placed our participants in the role of discovering information about a hypothetical campus partner, using the Rutgers questionnaire as a guide. This discussion elicited some good advice on how to approach prospective campus partners and learn about them. For example, in considering how to break the ice and learn about colleagues in campus IT, several participants suggested initiating a discussion of general IT-related topics, such as network security or new technologies. Another participant suggested that a careful review of unit web sites would help in gathering information about prospective campus partners. While the pandemic has certainly introduced challenges in connecting to colleagues around campus, it has also sometimes made it easier: as one person pointed out, more people are attending inter-departmental meetings because it is easier to join a Zoom meeting than to be physically present at a particular location on campus. Discovering information about other units on campus is not without challenges, as many of our workshop participants shared in the discussions. For example, several participants reported that it was sometimes difficult to pinpoint who to connect with, or even to identify the hierarchy of the unit. While direct, interpersonal contact usually helped in forming relationships, campus units with high staff turnover – such as the IT unit – made this problematic. And sometimes the information you discover about another unit’s responsibilities and needs may make forming successful partnerships even more daunting – for example, when it is clear that the other unit’s priorities do not easily mesh with those of the Library. By way of illustration, some participants observed that university Communications teams often seem remote from the Library, and do not adequately relay information about the Library and its activities. But as one person noted, it is a challenge for them to communicate everything, and currently, outreach to students and COVID-related information are understandably their priorities. But participants also told many stories of how discovering more about their colleagues around campus helped create actionable opportunities for the Library. Often, it was as simple as discovering how Library skills and capacities matched up to the needs of other units. For example, one participant described how units in the area of Academic Affairs valued input from the Library in regard to accreditation processes. Others talked about the fact that mandates aimed at promoting open science have started to have more “teeth”, with compliance receiving greater scrutiny. This creates a stronger demand for Library expertise in areas such as data management plans. Several participants mentioned that they have learned that rising interest in inter-disciplinary, “Grand Challenge” projects has created a need for project support capacities that the Library can provide. And hearing from other campus units and departments about their need to better document productivity and output creates opportunities for Library staff to initiate conversations about tools such as ORCID that help advance that goal. What do your campus partners know about you? In one of the breakout discussions, a participant observed that they did not know very much about what their university Communications team did. But, the participant continued, this probably means that the Communications staff probably did not know much about what the Library does! In the second set of breakout discussions, participants once again utilized the Rutgers script as a frame as they considered what other campus units might say about how the Library and its services contribute toward their work. One participant noted that the script questions were particularly useful for bringing out misconceptions about what the library can and cannot do. In the course of the discussions, several themes emerged from participants’ experiences of how the Library is perceived across campus. First, it was clear that campus stakeholders often do not have a clear picture of the expertise and capacities of today’s academic library. Participants noted, for example, that stakeholders often are unaware that the Library can help with every aspect of the research cycle. Much more effort is needed to raise awareness about the Library’s role in the university research enterprise. One participant observed that at their university, the Library provides good data management support, but it took a lot of work to make researchers see this expertise. Another person related a similar experience, remarking that Library participation in cross-campus projects requires a lot of energy and communication – not least because many campus stakeholders are not fully aware of what the Library can do. Or, as one participant pointed out, stakeholders may utilize Library services without realizing it is the Library that is providing them. Another theme touched on the need to establish a clear boundary around Library services within the broader university service eco-system. One participant remarked that the Library provides many services, but some of them are also offered by other campus units. For example, if a researcher needs data storage services, should they go to campus IT, or to the Library? One participant described a circular process whereby the Library receives a technical question and passes it to the IT unit, which then passes it back to the Library for resolution. Participants also noted that as the Library takes on new, emerging roles beyond its traditional functions, there is a tendency to “step on toes” and awaken territorial instincts in other campus units. But some participants also pointed out synergies that could be leveraged. A good example is that both the Library and the IT unit face budgetary challenges. When requests are received for support that neither unit can provide, but for which there is a clear need, they can collaborate to build a case for additional resources to address the gap. Finally, workshop participants noted a shared need to elevate the perception of the Library’s capabilities across campus. Participants shared examples they have encountered of outdated or even indifferent perceptions of the Library and its services: “important but not essential to Research Office day-to-day business”; “the Office of the Vice Provost would not think of many areas that the Library supports in the university research enterprise”; “the Library buys the books”; “seen as useful, but difficult to get them to see things the Library should take the lead on”; “not seen as thought leaders or a source for answers, but as service providers.” Several participants cautioned against the Library being seen strictly in an administrative support role in cross-unit initiatives; one person observed that the Library’s responsibility to manage article processing charges (APCs) reinforces a perception as “book keeper” or “note-taker”. Library staff are often included in projects only after funding is received, rather than being included as a partner as the project is being developed. How to counteract these perceptions? Participants emphasized the need for a “negotiation” process to ease the tension between what is expected from libraries and what libraries can offer. In short, libraries must learn to say “No” when necessary. Other campus units often expect a great deal from libraries, and library staff must strike a difficult balance between doing as much as possible to advance the interests of other units while at the same time preserving clear goals and advocating for Library-related priorities. As one person noted, “there is SO MUCH education to be done” to dispel the notion that libraries are useful only for administrative support. Libraries must break down and re-build these expectations. To do this, library staff need to be more proactive, rather than reactive, in their cross-campus partnerships. More openness across units is also needed, and libraries can set a good example in promoting transparency. And because, as one participant put it, “our services are not always top of mind”, library staff should work with the university Communications team, as well as influential faculty and administrators, “to get our message across.” How do you feel about cross-unit partnerships now? We concluded the workshop by asking participants to select one word to describe their current feelings about the prospect for cross-campus partnerships at their institution, in light of what they learned over the three sessions. We were gratified to see that the top response was “optimistic”! And indeed, with careful attention to the importance and need for social interoperability, and the techniques and practices we discussed to build it in the campus environment, library staff can be optimistic that their campus partnerships will be successful, and that the full value proposition of the Library will be better understood and utilized across the university research enterprise. Special thanks to all of our workshop participants for sharing their insights through lively and enlightening discussions, and to our colleagues at LIBER for working with us to make the workshop a success (a great example of social interoperability in action!) The post Social interoperability: Getting to know all about you appeared first on Hanging Together. David Rosenthal: Yet Another DNA Storage Technique SourceAn alternative approach to nucleic acid memory by George D. Dickinson et al from Boise State University describes a fundamentally different way to store and retrieve data using DNA strands as the medium. Will Hughes et al have an accessible summary in DNA ‘Lite-Brite’ is a promising way to archive data for decades or longer:We and our colleagues have developed a way to store data using pegs and pegboards made out of DNA and retrieving the data with a microscope – a molecular version of the Lite-Brite toy. Our prototype stores information in patterns using DNA strands spaced about 10 nanometers apart. Below the fold I look at the details of the technique they call digital Nucleic Acid Memory (dNAM).The traditional way to use DNA as a storage medium is to encode the data in the sequence of bases in a synthesized strand, then use sequencing to retrieve the data. Instead:dNAM uses advancements in super-resolution microscopy (SRM)15 to access digital data stored in short oligonucleotide strands that are held together for imaging using DNA origami. In dNAM, non-volatile information is digitally encoded into specific combinations of single-stranded DNA, commonly known as staple strands, that can form DNA origami nanostructures when combined with a scaffold strand. When formed into origami, the staple strands are arranged at addressable locations ... that define an indexed matrix of digital information. This site-specific localization of digital information is enabled by designing staple strands with nucleotides that extend from the origami. WritingIn dNAM, writing their 20 character message "Data is in our DNA!\n" involved encoding it into 15 16-bit fountain code droplets then synthesizing two different types of DNA sequences:Origami: There is one origami for each 16 bits of data to be stored. It forms a 6x8 matrix holding a 4 bit index, the 16 bits of droplet data, 20 bits of parity, 4 bits of checksum, and 4 orientation bits. Each of the 48 cells thus contains a unique, message-specific DNA sequence.Staples: There is one staple for each of the 15x48 matrix cells, with one end of the strand matching the matrix cell's sequence, and the other indicating a 0 or a 1 by the presence or absence of a sequence that binds to the flourescent DNA used for reading.When combined, the staple strands bind to the appropriate cells in the origami, labelling each cell as a 0 or a 1.ReadingThe key difference between dNAM and traditional DNA storage techniques is that dNAM reads data without sequencing the DNA. Instead, it uses optical microscopy to identify each "peg" (staple strand) in each matrix cell as either a 0 or a 1: The patterns of DNA strands – the pegs – light up when fluorescently labeled DNA bind to them. Because the fluorescent strands are short, they rapidly bind and unbind. This causes them to blink, making it easier to separate one peg from another and read the stored information. The difficulty in doing so is that the pegs are on a 10 nanometer grid:Because the DNA pegs are positioned closer than half the wavelength of visible light, we used super-resolution microscopy, which circumvents the diffraction limit of light. The technique is called "DNA-Points Accumulation for Imaging in Nanoscale Topography (DNA-PAINT)". The process to recover the 20 character message was:40,000 frames from a single field of view were recorded using DNA-PAINT (~4500 origami identified in 2982 µm2). The super-resolution images of the hybridized imager strands were then reconstructed from blinking events identified in the recording to map the positions of the data domains on each origami ... Using a custom localization processing algorithm, the signals were translated to a 6 × 8 grid and converted back to a 48-bit binary string — which was passed to the decoding algorithm for error correction, droplet recovery, and message reconstruction ... The process enabled successful recovery of the dNAM encoded message from a single super-resolution recording. AnalysisThe first thing to note is that whereas traditional DNA storage techniques are volumetric, dNAM like hard disk or tape is areal. It will therefore be unable to match the extraordinary data density potentially achievable using the traditional approach. dNAM claims:After accounting for the bits used by the algorithms, our prototype was able to read data at a density of 330 gigabits per square centimeter. Current hard disks have an areal density of 1.3Tbit/inch2, or about 200Gbit/cm2, so for a prototype this is good but not revolutionary, The areal density is set by the 10nm grid space, so it may not be possible to greatly reduce it. Hard disk vendors have demonstrated 400Gbit/cm2 and have roadmaps to around 800Gbit/cm2.dNAM's writing process seems more complex than the traditional approach, so is unlikely to be faster or cheaper. The read process is likely to be both faster and cheaper, because DNA-PAINT images a large number of origami in parallel, whereas sequencing is sequential (duh!). But, as I have written, the big barrier to adoption of DNA storage is the low bandwidth and high cost of writing the data. Eric Lease Morgan: Searching CORD-19 at the Distant Reader This blog posting documents the query syntax for an index of scientific journal articles called CORD-19. CORD-19 is a data set of scientific journal articles on the topic of COVID-19. As of this writing, it includes more than 750,000 items. This data set has been harvested, pre-processed, indexed, and made available as a part of the Distant Reader. Access to the index is freely available to anybody and everybody. The index is rooted in a technology called Solr, a very popular indexing tool. The index supports simple searching, phrase searching, wildcard searches, fielded searching, Boolean logic, and nested queries. Each of these techniques are described below: simple searches – Enter any words you desire, and you will most likely get results. In this regard, it is difficult to break the search engine. phrase searches – Enclose query terms in double-quote marks to search the query as a phrase. Examples include: "waste water", "circulating disease", and "acute respiratory syndrome". wildcard searches – Append an asterisk (*) to any non-phrase query to perform a stemming operation on the given query. For example, the query virus* will return results including the words virus and viruses. fielded searches – The index has many different fields. The most important include: authors, title, year, journal, abstract, and keywords. To limit a query to a specific field, prefix the query with the name of the field and a colon (:). Examples include: title:disease, abstract:"cardiovascular disease", or year:2020. Of special note is the keywords field. Keywords are sets of statistically significant and computer-selected terms akin to traditional library subject headings. The use of the keywords field is a very efficient way to create a small set of very relevant articles. Examples include: keywords:mrna, keywords:ribosome, or keywords:China. Boolean logic – Queries can be combined with three Boolean operators: 1) AND, 2) OR, or 3) NOT. The use of AND creates the intersection of two queries. The use of OR creates the union of two queries. The use of NOT creates the negation of the second query. The Boolean operators are case-sensitive. Examples include: covid AND title:SARS, abstract:cat* OR abstract:dog*, and abstract:cat* NOT abstract:dog* nested queries – Boolean logic queries can be nested to return more sophisticated sets of articles; nesting allows you to override the way rudimentary Boolean operations get combined. Use matching parentheses (()) to create nested queries. An example includes ((covid AND title:SARS) OR abstract:cat* OR abstract:dog*) NOT year:2020. Of all the different types of queries, nested queries will probably give you the most grief. Tara Robertson: Strategic and effective storytelling with data I was delighted to speak at Data Science by Design’s Creator Conference. DSxD is a community of researchers, educators, artists, computer scientists who conference organizers described as curious, dynamic, creative and interdisciplinary. I chose to talk about the challenge of communicating about diversity metrics in a way that informs and inspires your audience to want to push for change using Mozilla’s last external diversity disclosure as an example.  Here are 3 important things to keep in mind when storytelling using data: Understand who your audience is Share the context of the data with your audience Think about how you want your audience to feel  (and what you want them to do) after seeing the data  Audience As this was an external disclosure it is obvious that one audience was people outside Mozilla. Transparency about DEI metrics is table stakes for companies that say they care about these things. It’s also an important part of a company’s employer brand, especially for younger workers. When I’ve been interested in a job, I’m checking to see if companies are sharing their diversity metrics, what photos they use to illustrate who works there, what diversity in senior leadership looks like and how they’re telling the story of what their culture is like. When I see stories of free beer Fridays, ping pong tables and “a work hard play hard” culture, I am much less interested in applying. I don’t drink alcohol, I don’t like ping pong, and I value work-life balance. Equally important to me was the internal audience at Mozilla. We had done a lot of internal systems work including redesigning our hiring process, removing meritocracy from our governance and leadership structures, and improving accessibility internally by live captioning all company meetings being more intentional about accessibility at events. I wanted people to see that all of these projects laddered up to measurable change. I wanted them to see progress and feel proud of all of our hard work. Context A few years ago this is something I would have said:  By the end of 2019, representation of women in technical roles was 21.6%. This would prompt many questions from people, including: Is this any good?  How do we compare to other companies? How do we compare to the labor market? What does our pipeline look like? Are we getting better or worse? What do you mean by technical role? Is a data analyst a technical role? What about a data scientist?  Context is so important!  I decided to add an explainer video to the disclosure to help people understand the context. In addition, the video starts with the big picture context on why we were invested in D&I at Mozilla:  from the the individual experience of feeling like you belong to being directly connected to the mission “to make the internet open and accessible for all” as well as the business case on innovation and performance. Emotion Taking a data-driven approach is necessary in an engineering organization–most people want to see and understand the numbers. For the last diversity disclosure I prepared, I saw the opportunity to try and tell a story that could connect to people’s heads and hearts.    I wanted them to feel something, whether that was pride at the progress we’d made, or frustration that we weren’t making change quickly enough. My ideal outcome was to pique people’s curiosity and have them ask “what can I do to make Mozilla a diverse and inclusive inclusive place?”. My worst case scenario is that people heard this update and thought “meh, whatever”. Here’s the 3 minute video: Drew Merit is the illustrator who brought this idea to life. I’d love examples from your work, or examples that you’ve seen out in the wild where people have used data to tell a story that inspires the audience to take action. The post Strategic and effective storytelling with data appeared first on Tara Robertson Consulting. Ed Summers: Untitled Several years ago someone in our neighborhood was moving out of the area and was giving away their old piano. We figured out a way to get it to our house, and it has sat (still untuned) in a corner of our living room ever since. The kids have sporadically taken piano lessons, and the various teachers who have heard it have all been polite not to comment on its current state. Mostly it just sits there, part table, part decoration, until someone stops by to play a little tune. It’s interesting to hear the kids develop little signature riffs that they play while walking by the piano in the morning, or sometimes before going to bed. Here’s a short untitled little tune that Maeve regularly plays, almost like a little prayer, or memory: Lucidworks: How to Capture Chatbots’ Untapped Potential Shoppers are turning to chatbots for more than customer service. Here's how to make your chatbot the intelligent, well-rounded conversational app that consumers expect. The post How to Capture Chatbots’ Untapped Potential appeared first on Lucidworks. HangingTogether: Reimagine Descriptive Workflows: meeting the challenges of inclusive description in shared infrastructure In a previous blog post, I told you about our Reimagine Descriptive Workflows project, and the path we took to get there. In that post, I shared the three objectives we have in this project. Convene a conversation of community stakeholders about how to address the systemic issues of bias and racial inequity within our current collection description infrastructure.Share with libraries the need to build more inclusive and equitable library collections.Develop a community agenda to help clarify issues for those who do knowledge work in libraries, archives, and museums; prioritize areas for attention from these institutions; and provide guidance for those national agencies and suppliers. Coming together In this post, I’m going to fill you in on the Reimagine Descriptive Workflows convening we held in June. Our virtual meeting took place in June (22 – 24 in North America, 23-25 in Australia & New Zealand). Fifty-nine people from the US, Canada, Australia, and New Zealand attended the meeting, which was designed and co-facilitated by Shift Collective. Prior to the convening, the project team met twice with the advisory group, who helped shape the following goals for the event: Create a safe space to share and connect honestly as humansLay the foundations for relationship building and repairBuilding a basis for reciprocal relationships between communities and centers of powerInspire radical thinking to rebuild a more just metadata infrastructureStart building a concrete roadmap for change in the sector and keep conversation going The project team identified potential participants through a consultative process and via self nomination. We prioritized attendance for those who had demonstrated leadership working in the area of “just descriptions.” We also prioritized the attendance of BIPOC colleagues, as well as others with lived experiences as members of underrepresented groups. All participants were offered a stipend to acknowledge and partially compensate in recognition of the valuable time, labor, and expertise they would bring to the event. The Reimagine Descriptive Workflows meeting was held, as so many are in these times, via Zoom. Recognizing that virtual meeting fatigue is real (and that our group included participants from the middle of Australia all the way to the east coast of North America) we met between two and three hours each day. For meeting organizers this presented a challenge – how to structure the time together so that people could connect with one another as humans and build connections and trust, but also produce concrete outputs that would help move the conversation forward. In order to help foster connections and build community, Asante Salaam (who I think of as the Shift team’s Minister of Culture!) helped to create a unique set of “cultural infusions” for participants, bringing in artists, musicians, a chef, and a poet and fostering conversations to give us an encounter with local flavor and culture from just some of the communities we were connecting with. It was not the same as being able to share a meal or gallery walk with others, but, for me, her efforts created a communal experience that supported the opportunity to connect with others outside of the official convening agenda. To help establish the space we would share for three days, the meeting hosts put forward the following Agreements. Share the space, step forward/step backListen and share bravelyListen for understandingSense and speak your feelingsUse “I” statementsDiscomfort is not the same as harmNo Alphabet Soup (don’t use acronyms and insider language without explaining it)Be kind to yourself and othersTake care of your needs Although we were together as the full group at the beginning and end of each day, and for our “cultural infusions,” most time was spent in smaller groups of five to six people, each supported by a guide. The guides took notes on behalf of the group and offered timekeeping support and gentle moderation when needed. Notes were kept in Miro (an ever-expanding online whiteboard and collaboration space). Here is an example of one of the discussion groups’ Miro board at the conclusion of the convening. Thanks to Shift team member, Tayo Medupin, who put a lot of effort and artistic touches into the design of these boards – it made me feel like I was in a distinctive space as opposed to a characterless virtual room. Screen capture: notes from one discussion group Composting, weeding, and seeding The topic for the first day was “Composting: What is driving us forward?” During this day we worked through prompts such as… Why is taking this journey together meaningful? Where is our abundance, and what assets should we bring with us? What stories and experiences of positive change will we build upon? What must be acknowledged? On day two, the topic was “Weeding: What is holding us back?” Here, participants were given time to draw a map or diagram of what a just, anti-racist and equitable descriptive workflow would look like. Prompts included calling out systemic, technical or procedural, social, cultural or personal blockers that might exist to implementing that workflow. Between the second and third days, Tayo Medupin, together with other members of the Shift team, worked her Miro magic, collecting notes from the various discussion groups and mapping them to eleven Design Challenges. On the final day of the convening the small groups explored the topic “Seeding: Opportunities for change?” and adopted a Design Challenge, using the time to explore questions related to the challenge. Because of the limited amount of time we had to spend together, the small groups were only able to dig into a few of these. Reimagine Descriptive Workflows design challenges [Note: These are in draft form and have received little review. We are sharing to give a sense of meeting outcomes.] Screen capture: notes from discussions mapped to design challenges ARE WE DONE YET? Insight: We’re trying to catalog and describe a world which is dynamic, fluid, complex and evolving over time in a cataloging culture that rewards the singular, definitive, and static. Opportunity: How might we create the conditions for / support a move towards a cataloging culture that embraces the long-term view, valuing and rewarding evolution, deepening, enrichment and progress over the concept of ‘complete’? STOP AND LEARN (CONNECTING WITH COMMUNITIES) Insight: We’re trying to slow down and involve communities in our workflows in equitable ways within a cataloging culture that pushes us to speed up and to spend and value time / resource in ways that can be at odds with slowing down and equitable collaboration. Opportunity: How might we create the conditions for / support a move towards a cataloging culture that demonstrably values community engagement by making it accepted and even expected to slow down and invest our time and money in this way? IN IT FOR THE LONG HAUL Insight: We have been and will be trying to create just metadata description across multiple generations. We are currently riding a wave of socio-political interest and prioritization that may or may not last. Opportunity: How might we create the conditions for / support the foundations for a resilient (anti-fragile) system of actors and activity pushing towards just metadata description that will be able to survive the generation to come? CONTEXT Insight: We are trying to redress hundreds of years of white supremacist colonial describing at scale in a system that is judged and valued on the legacy descriptions and language we can still see right now. Opportunity: How might we create the conditions for / support a move towards a mutuality of understanding about where we are in the journey and what road is left ahead? COMMON DEFINITIONS Insight: We are trying to work towards a just, equitable, anti-racist, anti-oppressive approach, but are we working within a common understanding of what this means and should/could look like in the sector? Opportunity: How might we create the conditions for / support the creation of share visions and definitions of ‘good’ held by those working towards just description? CHANGE CULTURE Insight: We’re often trying to make changes within organizational structures and cultures that can feel resistant or challenging to change. Opportunity: How might we create the conditions for / support individuals, teams and collectives to help shape and reshape the cultures of our core institutions to ready them for this long and hard period of change? CONNECTED ABUNDANCE Insight: We’re trying to change a huge legacy system often in our silos, in isolation, experiencing scarcity and without the clout of a network of others also making strides in the fight. Opportunity: How might we create the conditions / support the growth of a thriving and resilient network of people, groups and organizations sharing the energy, bravery, resource, ideas, information and rest needed for the sector to transform? POWER TO CHANGE Insight: We’re trying to change a huge legacy system in our own ways but many of us in our work, teams, institutions and sector do not feel we have the power and agency to make the necessary change Opportunity: How might we create the conditions / support the growth of a sector where everyone feels the power and agency to drive forward the necessary change? LIBERATING THE LIBERATORS Insight: There are pockets of the future in the present in smaller institutions and in individuals who are pioneering just, anti-oppressive approaches, but they are often hampered by scale, visibility, recognition and reward. Opportunity: How might we create the conditions / support the growth and progress of our system liberators to help them to create and scale the changes and cultures we need to transform us? THIRD HORIZON Insight: We’re trying to create just, equitable, anti-racist and anti-oppressive descriptions within a structure and worldview of describing which is conceptually unjust, inequitable, racist and oppressive. Opportunity: How might we create the conditions for / support a radical rethink of the very concept of cataloging and metadata description, to lay the foundations for an approach that will better serve us for the next 200 years? FEEDBACK CULTURE Insight: We’re trying to create just metadata description in a culture that doesn’t currently prioritize, demand, embrace or leave space for external feedback. Opportunity: How might we create the conditions for / support a move towards a cataloging culture that demands, priorities and creates room for external / community feedback? Thanks and gratitude An important output of the Reimagine Descriptive Workflows project was the construction of a novel online convening that helped to support a brave space for productive and honest conversations about the challenges and solutions around inclusive and anti-racist description. This convening was the mudsill, setting the stage for everything to come. For seven hours of meeting, many, many more were put into the planning. First and foremost, we want to thank our advisory group, which has really been at the heart of this project. We are grateful to this amazing group which not only brings their substantial professional perspectives but also their network connections and their lived experiences in this space. This group devoted heart and dedication, doing this work on top of their very busy professional and personal lives. Stacy Allison-Cassin, Jennifer Baxmeyer, Dorothy Berry, Kimberley Bugg, Camille Callison, Lillian Chavez, Trevor A. Dawes, Jarret Martin Drake, Bergis Jules, Cellia Joe-Olsen, Katrina Tamaira, Damien Webb. We were gratified that nearly every person who was invited to the meeting not only accepted our invitation but came to the meeting and shared experiences and ideas. Convening attendees added so much by contributing, preparing, and being present. They made this so much more than another Zoom meeting. Audrey Altman, Jill Annitto, Heidy Berthoud, Kelly Bolding, Stephanie Bredbenner, Itza Carbajal, May Chan, Alissa Cherry, Sarah Dupont, Maria Estorino, Sharon Farnel, Lisa Gavell, Marti Heyman, Jay Holloway, Jasmine Jones, Michelle Light, Sharon Leon, Koa Luke, Christina Manzella, Mark Matienzo, Rachel Merrick, Shaneé Yvette Murrain, Lea Osborne, Ashwinee Pendharkar, Treshani Perera, Nathan Putnam, Keila Zayas Ruiz, Holly Smith, Gina Solares, Michael Stewart, Katrina Tamaira, Diane Vizine-Goetz, Bri Watson, Beacher Wiggins, and Pamela Wright. Many thanks also to the team at Shift Collective that helped to design and facilitate the meetings: Gerry Himmelreich, Jennifer Himmelreich, Lynette Johnson, Tayo Medupin, Asante Salaam, and Jon Voss. An OCLC team also contributed to the planning and implementation: Rachel Frick, Bettina Huhn, Nancy Lensenmayer, Mercy Procaccini, Merrilee Proffitt, and Chela Scott Weber. Finally, a big thank you to the Andrew W. Mellon Foundation for co-investing alongside OCLC. This seed funding made this convening possible. Next steps The eleven Design Challenges barely scratch the surface of everything that was covered at the meeting. The project team still has hours of transcripts and other meeting outputs to dig through. We’ll be using those outputs to construct a draft Community Agenda, as we promised at the outset of this project. We will make that draft available for broad community comment before publishing. We will also be using the Community Agenda to structure conversations with library leaders and other stakeholders – we believe it is important in socializing this work to get a sense of how those with power and access to purse strings see their role in implementing this work. And, of course we will be doing work internal to OCLC to consider our own role in the vision that this community has created. As we consider our next steps, we are taking seriously our responsibilities as stewards of this conversation. Although preliminary feedback from the meeting was overwhelmingly positive, many attendees expressed a yearning to be able to connect, or continue to connect and learn from one another. We are considering how best to nurture that seed so that it can grow. Thanks to Marti Heyman, Andrew Pace, Mercy Procaccini, and Chela Weber who reviewed and improved this blog post. The post Reimagine Descriptive Workflows: meeting the challenges of inclusive description in shared infrastructure appeared first on Hanging Together. Lucidworks: Activate Conference 2021: Call for Submissions The annual Activate Conference continues virtually in 2021, and submissions for speakers are now open. The post Activate Conference 2021: Call for Submissions appeared first on Lucidworks. In the Library, With the Lead Pipe: Dismantling the Evaluation Framework (Atharva Tulsi, Unsplash, https://unsplash.com/photos/RVpCAtjhyuA) By Alaina C. Bull, Margy MacMillan, and Alison J. Head In brief For almost 20 years, instruction librarians have relied on variations of two models, the CRAAP Test and SIFT, to teach students how to evaluate printed and web-based materials. Dramatic changes to the information ecosystem, however, present new challenges amid a flood of misinformation where algorithms lie beneath the surface of popular and library platforms collecting clicks and shaping content. When applied to increasingly connected networks, these existing evaluation heuristics have limited value. Drawing on our combined experience at community colleges and universities in the U.S. and Canada, and with Project Information Literacy (PIL), a national research institute studying college students’ information practices for the past decade, this paper presents a new evaluative approach for teaching students to see information as the agent, rather than themselves. Opportunities and strategies are identified for evaluating the veracity of sources, first as students, leveraging the expertise they bring with them into the classroom, and then as lifelong learners in search of information they can trust and rely on. 1. Introduction Arriving at deeply considered answers to important questions is an increasingly difficult task. It often requires time, effort, discernment, and a willingness to dig below the surface of Google-ready answers. Careful investigation of content is needed more than ever in a world where information is in limitless supply but often tainted by misinformation, while insidious algorithms track and shape content that users see on their screens. Teaching college students evaluative strategies essential for academic success and in their daily lives is one of the greatest challenges of information literacy instruction today. In the last decade, information evaluation — the ability to ferret out the reliability, validity, or accuracy of sources — has changed substantively in both teaching practice and meaning. The halcyon days of teaching students the CRAAP Test1, a handy checklist for determining the credibility of digital resources, are over2; and, in many cases, SIFT3, another reputation heuristic, is now in use on numerous campuses. At the same time, evaluative strategies have become more nuanced and complex as librarians continue to debate how to best teach these critically important skills in changing times.4  In this article, we introduce the idea of proactivity as an approach that instruction librarians can use for re-imagining evaluation. We explore new ways of encouraging students to question how information works, how information finds them, and how they can draw on their own strengths and experiences to develop skills for determining credibility, usefulness, and trust of sources in response to an information ecosystem rife with deception and misinformation. Ultimately, we discuss how a proactive approach empowers students to become experts in their own right as they search for reliable information they can trust. 2. A short history of two models for teaching evaluation Mention “information literacy instruction” and most academic librarians and faculty think of evaluation frameworks or heuristics that have been used and adapted for nearly two decades. The most widely known are the CRAAP method, and more recently, SIFT, both designed to determine the validity and reliability of claims and sources. CRAAP debuted in 20045 when several academic librarians developed an easy to use assessment framework for helping students and instructors evaluate information for academic papers. CRAAP, a catchy acronym for Currency, Relevancy, Accuracy, Authority, Purpose, walks students through the criteria for assessing found content. For librarians, this approach to evaluation is a manifestation of the Information Literacy Competency Standards for Higher Education developed by the ACRL and especially an outcome of Standard 3.2: “Examines and compares information from various sources in order to evaluate reliability, validity, accuracy, authority, timeliness, and point of view or bias.”6 When the CRAAP method was first deployed nearly 20 years ago, the world was still making the transition from Web 1.0 to Web 2.0. Most online content was meant to be consumed, not interacted with, altered, changed, and shared. CRAAP was developed in a time when you found information, before the dramatic shift to information finding you. As monolithic players like Google and Facebook began using tracking software on their platforms in 2008 and selling access to this information in 2012, web evaluation became a very different process. In a role reversal, media and retail platforms, such as Amazon, had begun to evaluate their users to determine what information they should receive, rather than users evaluating what information they found. Since 2015, criticism has mounted about the CRAAP test, despite its continued and widespread use on campuses. Checklists like CRAAP are meant to reduce cognitive overload, but they can actually increase it, leading students to make poor decisions about the credibility of sources, especially in densely interconnected networks.7 As one critic has summed it up: “CRAAP isn’t about critical thinking – it’s about oversimplified binaries.”8 We agree: CRAAP was designed for a fairly narrow range of situations, where students might have little background knowledge to assist in judging claims and often had to apply constraints of format, date, or other instructor-imposed requirements; but these bore little resemblance to everyday interactions with information, even then. When Mike Caulfield published the SIFT model in 2019, it gave instruction librarians a  progressive alternative to the CRAAP test. Caulfield described his evaluation methods as a “networked reputation heuristic,”9 developed in response to the spread of misinformation and disinformation in the post-truth era. The four “moves” he identified — Stop, Investigate, Find, Trace — are meant to help people recontextualize information through placing a particular work and its claims within the larger realm of content about a topic. SIFT offers major improvements over CRAAP in speed, simplicity, and applicability to a wider scope of print and online publications, platforms, and purposes. Recently, researchers have identified the benefits of using this approach,10 and, in particular, the lateral reading strategies it incorporates. SIFT encourages students to base evaluation on cues that go beyond the intrinsic qualities of the article and to use comparisons across media sources to understand the trustworthiness of an article. This is what Justin Reich,11 Director of the MIT Teaching Systems Lab, noted in a 2020 Project Information Literacy (PIL) interview, calling SIFT a useful “first step,” since it may assist students in acquiring the background knowledge they need to evaluate the next piece of information they encounter on the topic.  Crucially, SIFT also includes the context of the information needed as part of evaluation – some situations require a higher level of verification than others. The actions SIFT recommends are more closely aligned with the kind of checking students are already using to detect bias12 and decide what to believe and how researchers themselves judge the quality of information.13 And while it is much better suited to today’s context, where misinformation abounds and algorithms proliferate, SIFT is still based on students encountering individual information objects, without necessarily understanding them as part of a system.  Our proposed next step, what we call proactive evaluation, would allow them not only to evaluate what they’re seeing but consider why they’re seeing what they do and what might be missing. SIFT, like CRAAP, is based on a reactive approach: the individual is an agent, acting upon information objects they find. In today’s information landscape, we think it is more useful to invert this relationship and consider the information object as the agent that is acting on the individual it finds.  3.  Information with agency Thinking of information as having agency allows us to re-examine the information environment we think we know. By the time they get to college, today’s students are embedded in the information infrastructure: a social phenomenon of interconnected sources, creators, processes, filters, stories, formats, platforms, motivations, channels, and audiences. Their profiles and behaviors affect not only what they see and share but also the relative prominence of stories, images, and articles in others’ feeds and search results. Information enters, flows through, and ricochets around the systems they inhabit – fueled, funded, and filtered by data gathered from every interaction. Research from PIL,14 and elsewhere,15 indicates that students who see algorithmic personalization at work in their everyday information activities already perceive information as having agency, specifically, the  ability to find them, follow them across platforms, and keep them in filter bubbles. They understand the bargain they are required to make with corporations like Amazon, Alphabet, and Facebook where they exchange personal data for participation in communities, transactions, or search efficiency. When PIL interviewed 103 undergraduates at eight U.S. colleges and universities in 2019 for the algorithm study, one student at a liberal arts college described worries we heard from others about the broader social impact of these systems: “I’m more concerned about the large-scale trend of predicting what we want, but then also predicting what we want in ways that push a lot of people towards the same cultural and political endpoint.”16 This student’s concern relates to the effects of algorithmic personalization and highlights student awareness of deliberate efforts to affect and, in many cases, infect the system.17 Subverting the flow of information for fun and profit has become all too common practice for trolls, governments, corporations, and other interest groups.18 The tactics we’ve taught students for evaluating items one at a time provide slim defenses against the networked efforts of organizations that flood feeds, timelines, and search results. While SIFT at least considers information as part of an ecosystem, we still need to help students go beyond evaluating individual information objects and understand the systems that intervene during the search processes, sending results with the agency to nudge, if not shove, users in certain directions.  That is why it is time to consider a new approach to the teaching of source evaluation in order to keep up with the volatile information ecosystem. Allowing for information to have agency, i.e. acknowledging information as active, targeted, and capable of influencing action, fundamentally alters the position of the student in the act of evaluation and demands a different approach from instruction librarians. We call this approach proactive evaluation. 4. Proactive evaluation What happens if we shift our paradigm from assuming that students are agents in the information-student interaction to assuming that the information source is the agent? This change in perspective will dramatically reframe our instruction in important ways. This perspective may initially seem to disempower the information literacy student and instructor, but given widespread disinformation in this post-truth era, this reversal might keep us, as instructors, grounded in our understanding of information literacy. Once we shift the understanding of who is acting upon whom, we can shift our approaches and techniques to reflect this perspective. This change in thinking allows us to move from reactive evaluation, that is, “Here is what I found, what do I think of it?” to proactive evaluation, “Because I understand where this information came from and why I’m seeing it, I can trust it for this kind of information, and for this purpose.” What does a proactive approach look like? Table 1 presents comparisons between reactive and proactive approaches to information literacy as a starting point for thinking about this shift in thinking. This typology acknowledges that college and university students come into our classrooms with a deep and wide knowledge of the information landscapes in which they exist.   A Model for Transitioning from Reactive to Proactive Evaluation ReactiveProactiveUnderstanding of informationIndividual objects you find→Networked objects that find youUnderstanding of evaluationIntrinsic (to the object)→Contextual (within the network)User is the agentInformation is the agent→Both the user and the information have agency in a dynamic relationshipHow/what we teachClosed yes/no questions with defined answers→Open questionsBinaries (good/bad, scholarly/popular)→Contextual continua (useful for topic x in circumstance y if complemented by z)Student as perpetual novice (evaluates from scratch every time)→Student as developing expert with existing knowledge, who brings expertise about information, subject, sources, processesEvaluate individual objects with novice tools and surface heuristics→Evaluate based on network context and connections, and build networks of trusted knowledge/sourcesCRAAPSIFT→Into the unknown As this typology suggests, our thinking rejects the “banking model of education” where students are empty vessels that educators must fill.19 To illustrate this point, PIL’s 2020 algorithm study has confirmed what we have long suspected: many students are already using evasive strategies to circumvent algorithmic tracking and bias. Their tactics, learned from friends and family, not their instructors, range from creating throwaway email accounts to using VPNs and ad-blocking apps to guard their personal data from algorithms.20 Students know that information is constantly trying to find them, identify them, label them, and sway them. And they may know this better than the faculty that teach them.21 Applying this to information literacy instruction means acknowledging that students approach information skeptically, and at least some students arrive in the classroom with defensive practices for safeguarding their privacy and mitigating invasive, biased information as they navigate the web and search for information. To build on this premise, we should be asking students to apply their defensive strategies to classroom-based tasks. “If this information showed up in your news stream, what tactics would you use to decide if you wanted to pass it along?” “What do you look for to know if this is valid or useful information?” “Instead of asking yes/no questions, e.g., ‘Is it written by an expert?’” “Is it current?” In particular, we should shift our assessment questions to an open-ended inquiry with students.  An example of how this could work would be asking the class what they do when they encounter a new piece of information in their own information landscape, such as a news story. How would students go about deciding if they would reshare it?  What are their motivations for sharing a news story? In PIL’s news study, for instance, more than half of the almost 6,000 students surveyed (52%) said their reason for sharing news on social media was to let friends and followers know about something they should be aware of, while more than two fifths (44%) said sharing news gives them a voice about a larger political or social cause.22 Does the same drive hold true for students in this classroom example?  For librarians using a proactive approach like this one, they could have a classroom discussion to see if their students also see themselves as stewards of what is important to know, while having a voice about larger causes in the world. A proactive approach also allows students to bring their prior networked knowledge into the discussion, rather than looking at a single point of information in isolation when directed by an instruction librarian. Asking students to make their tacit processes more explicit will also help them see the information networks they have already built more clearly. They may be using other factors in their decision-making, like who recommended a source or the context in which the information will be used. These evaluation points are also used by researchers when assessing the credibility of information.23 Providing opportunities for students to reflect on and articulate their interactions with information in the subject areas where they feel confident may allow them to transfer skills more easily to new, less familiar, academic domains. Students sharing these kinds of spontaneous reflections can leverage the social aspect of information skills. PIL studies have shown repeatedly that students lean on each other when they evaluate content for academic, employment, and everyday purposes; when necessary they also look to experts, including their instructors, to suggest or validate resources. Evaluation is fundamentally a social practice, but the existing heuristics don’t approach it this way. Reliance on other people as part of trusted information networks is rarely even acknowledged, let alone explicitly taught in formal instruction, as we tend to focus on the stereotype of the solitary scholar. Gaining understanding of their own information networks, students can learn to see the operations of other networks, within disciplines, news, and other commercial media. If they are aware of the interconnectedness of information, they can use those connections to evaluate content and develop their mental Rolodexes of trusted sources.24 Understanding which sources are trustworthy for which kinds of information in which contexts is foundational knowledge for both academic work and civic engagement. Building on SIFT strategies, it’s possible for students to accumulate knowledge about sources by validating them with tools like Wikipedia. Comparing and corroborating may illuminate the impact of algorithms and other systems that make up the information infrastructure.25 Developing this kind of map of their network of trusted sources can help them search and verify more strategically within that network, whether they’re in school or not. As they come to understand themselves as part of the information infrastructure, students may be able to reclaim some agency from the platforms that constrain and control the information they see. While they may not ever be able to fully escape mass personalization, looking more closely at its effects may increase awareness of when and how search results and news feeds are being manipulated. Students need to understand why they see the information that streams at them, the news that comes into their social media feeds, the results that show up at the top of a search, and what they can do to balance out the agency equation and regain some control. Admittedly, this form of instruction is clearly more difficult to implement than turnkey checklists and frameworks. It is much harder to fit into the precious time of a one-shot. It requires trust in the students, trust in their prior knowledge, and trust in their sense-making skills. This change in perspective about how we teach evaluation is not a magic bullet for fixing our flawed instruction practices. But we see proactive evaluation as an important step for moving our profession forward in teaching students how to navigate an ever-changing information landscape. This proactive model can be used in conjunction with, or independent of, SIFT to create a more complex information literacy. Reactive evaluation considers found information objects in isolation, based on intrinsic qualities, regardless of the user or intended use. In a proactive approach, the user considers the source while evaluating information contextually, through its relationships to other sources and to the user. Over time, a user can construct their own matrix of trusted sources. It’s similar to getting to know a new city; a newcomer’s mental map gradually develops overlays of shortcuts, the safe and not-so-safe zones, and the likely places to find what they need in a given situation. Eventually, they learn where to go for what, a critical thinking skill they can take with them through the rest of their education and everyday lives and apply with confidence long after graduation. 5. Into the unknown Reactive approaches to evaluation are not sufficient to equip students to navigate the current and evolving information landscape. What we have proposed in this paper is an alternative, what we call a proactive approach, to information evaluation that moves away from finite and simple source evaluation questions to open-ended and networked questions. While a proactive approach may feel unfamiliar and overwhelming at first, it moves away from the known to the unknown to create a more information-literate generation of students and lifelong learners.   But what if this approach is actually not as unfamiliar as it may seem? The current ACRL framework paints a picture of the “information-literate student” that speaks to a pedagogy that cultivates a complex and nuanced understanding of the information creation process and landscape. For example, in the “Scholarship as Conversation” frame, these dispositions include “recognize that scholarly conversations take place in various venues,” and “value user-generated content and evaluate contributions made by others.”26  Both dispositions require a nuanced understanding of the socialness of scholarship and imply evaluation within a social context. And while heuristics that rely on finite and binary responses are easy to teach, they create more problems than they solve. Focusing on the network processes that deliver the information in front of us, instead of focusing on these finite questions, allows for a different kind of knowing.  The next question for instructors to tackle is what this proactive approach looks like in the classroom. In our field, discussions of “guide on the side” and “sage on the stage” are popular, but what we are actually advocating in this article isn’t a guide or a sage, as both assume a power structure and expertise that is incomplete and outdated. In the classroom, we advocate a shift from guiding or lecturing to conversation. We do not have a set of desired answers that we are hoping to coax out of the students: Which of these sources is valid? Who authored this source, and are they an expert? Rather, a proactive approach encourages students to engage and interact with their ideas and previous experiences around information agency, the socialness of the information, and how they evaluate non-academic sources. This will allow students to bring their deep expertise into the classroom. We have alluded to open-ended questions as part of the proactive approach, but this is more accurately described as an open dialogue. This type of instruction is difficult in the one-shot structure, as it relies on trust. An unsuccessful session looks like your worst instruction experience, with the students staring blankly at you and not engaging, leaving lots of empty space and the strong desire to revert to lecturing on database structures. A successful session will feel like an intellectual conversation where you as the “teacher” learn as much as you impart, and the conversation with students is free-flowing and engaging.  Returning to the earlier example of asking how a student would chose whether or not to reshare a news story, this type of dialogue could include conversations about what they already know about the news source, what they know about the person or account that initially shared it, how they might go about reading laterally, what their instincts say, how this does or does not fit with their prior knowledge on the subject, and their related reactions. During the course of the discussion, it will lead to what areas of information literacy and assessment need more dialogue and what areas the students are already skilled and comfortable in.  The kind of information literacy instruction that assumes agency rests solely with the user, who finds and then evaluates individual information objects, is no longer valid now that information seeks out the user through networked connections. This reversal of the power dynamic underlies many of the gaps between how evaluation is taught in academic settings and how it occurs in everyday life. The approach we advocate balances out these extremes and helps students recognize and regain some of their agency. By understanding how information infrastructures work and their roles within them, students can adapt the tactics that many of them are already using to become more conscious actors. 6. Looking Ahead  In this article, we have discussed an alternative to current evaluation approaches that is closely tied to the issue of trust: trusting our students to bring their own experiences and expertise to the information literacy classroom. But our work doesn’t end there. Our approach also requires us to trust ourselves as instructors. We will need to trust that we do in fact understand the continuously changing information landscape well enough to engage with open-ended, complex questions, rather than a prescribed step-by-step model. We must continue to inform ourselves and reevaluate information systems — the architectures, infrastructures, and fundamental belief systems — so we can determine what is trustworthy. We have to let go of simple solutions to teach about researching complex, messy problems. For college students in America today, knowing how to evaluate news and information is not only essential for academic success but urgently needed for making sound choices during tumultuous times. We must embrace that instruction, and information evaluation, are going to be ugly, hard, and confusing for us to tackle but worth it in the end to remain relevant and useful to the students we teach.  Acknowledgements We are grateful to Barbara Fister, Contributing Editor of the “PIL Provocation Series” at Project Information Literacy (PIL) for making incisive suggestions for improving this paper, and Steven Braun, Senior Researcher in Information Design at PIL, for designing Table 1. The article has greatly benefited from the reviewers assigned by In the Library with the Lead Pipe: Ian Beilin, Ikumi Crocoll, and Jessica Kiebler. References  “Framework for Information Literacy for Higher Education.” 2016. Association of College and Research Libraries. January 16. https://www.ala.org/acrl/standards/ilframework. “Information Literacy Competency Standards for Higher Education.” 2000. Association of College and Research Libraries. January 18. http://www.acrl.org/ ala/mgrps/divs/acrl/standards/standards.pdf. Bengani, Priyanjana. “As Election Looms, a Network of Mysterious ‘Pink Slime’ Local News Outlets Nearly Triples in Size.” Columbia Journalism Review, August 4, 2020. https://www.cjr.org/analysis/as-election-looms-a-network-of-mysterious-pink-slime-local-news-outlets-nearly-triples-in-size.php. Blakeslee, Sarah. “The CRAAP Test.” LOEX Quarterly 31, no. 3 (2004). https://commons.emich.edu/loexquarterly/vol31/iss3/4. Breakstone, Joel, Mark Smith, Priscilla Connors, Teresa Ortega, Darby Kerr, and Sam Wineburg. “Lateral Reading: College Students Learn to Critically Evaluate Internet Sources in an Online Course.” The Harvard Kennedy School Misinformation Review 2, no. 1 (2021): 1–17. https://doi.org/10.37016/mr-2020-56. Brodsky, Jessica E., Patricia J. Brooks, Donna Scimeca, Ralitsa Todorova, Peter Galati, Michael Batson, Robert Grosso, Michael Matthews, Victor Miller, and Michael Caulfield. “Improving College Students’ Fact-Checking Strategies through Lateral Reading Instruction in a General Education Civics Course.” Cognitive Research: Principles and Implications 6 (2021). https://doi.org/10.1186/s41235-021-00291-4. Caulfield, Mike. “A Short History of CRAAP.” Blog. Hapgood (blog), September 14, 2018. https://hapgood.us/2018/09/14/a-short-history-of-craap/. ———. Truth is in the network. Email, May 31, 2019. https://projectinfolit.org/smart-talk-interviews/truth-is-in-the-network/. ———. Web Literacy for Student Fact-Checkers, 2017. https://webliteracy.pressbooks.com/. Dubé, Jacob. “No Escape: The Neverending Online Threats to Female Journalists.” Ryerson Review of Journalism, no. Spring 2018 (May 28, 2018). https://rrj.ca/no-escape-the-neverending-online-threats-to-female-journalists/. Fister, Barbara. “The Information Literacy Standards/Framework Debate.” Inside Higher Ed, Library Babel Fish, January 22, 2015. https://www.insidehighered.com/blogs/library-babel-fish/information-literacy-standardsframework-debate. Foster, Nancy Fried. “The Librarian-Student-Faculty Triangle: Conflicting Research Strategies?” Library Assessment Conference, 2010. https://urresearch.rochester.edu/researcherFileDownload.action?researcherFileId=71. Freire, Paulo. “The Banking Model of Education.” In Critical Issues in Education: An Anthology of Readings, 105–17. Sage, 1970. Haider, Jutta, and Olof Sundin. “Information Literacy Challenges in Digital Culture: Conflicting Engagements of Trust and Doubt.” Information, Communication and Society, 2020. https://doi.org/10.1080/1369118X.2020.1851389. Head, Alison J., Barbara Fister, and Margy MacMillan. “Information Literacy in the Age of Algorithms.” Project Information Literacy Research Institute, January 15, 2020. https://projectinfolit.org/publications/algorithm-study. Head, Alison J., John Wihbey, P. Takis Metaxas, Margy MacMillan, and Dan Cohen. “How Students Engage with News: Five Takeaways for Educators, Journalists, and Librarians.” Project Information Literacy Research Institute, October 16, 2018. https://projectinfolit.org/pubs/news-study/pil_news-study_2018-10-16.pdf. Maass, Dave, Aaron Mackey, and Camille Fischer. “The Follies 2018.” Electronic Frontier Foundation, March 11, 2018. https://www.eff.org/deeplinks/2018/03/foilies-2018. Meola, Marc. “Chucking the Checklist: A Contextual Approach to Teaching Undergraduates Web-Site Evaluation.” Libraries and the Academy 4, no. 3 (2004): 331–44. https://doi.org/10.1353/pla.2004.0055. Reich, Justin. Tinkering Toward Networked Learning: What Tech Can and Can’t Do for Education. December 2020. https://projectinfolit.org/smart-talk-interviews/tinkering-toward-networked-learning-what-tech-can-and-cant-do-for-education/. Seeber, Kevin. “Wiretaps and CRAAP.” Blog. Kevin Seeber (blog), March 18, 2017. http://kevinseeber.com/blog/wiretaps-and-craap/. The CRAAP Test (Currency, Relevance, Authority, Accuracy, Purpose) is a reliability heuristic designed by Sarah Blakeslee and her librarian colleagues at Chico State University. See: Sarah Blakeslee, “The CRAAP Test,” LOEX Quarterly 31 no. 3 (2004): https://commons.emich.edu/loexquarterly/vol31/iss3/4Kevin Seeber, “Wiretaps and CRAAP,” Kevin Seeber [Blog], (March 18, 2017): http://kevinseeber.com/blog/wiretaps-and-craap/Mike Caulfield, “The Truth is in the Network” [email interview by Barbara Fister], Project Information Literacy, Smart Talk Interview, no. 31, (December 1, 2020)https://projectinfolit.org/smart-talk-interviews/truth-is-in-the-network/Barbara Fister, “The Information Literacy Standards/Framework Debate,” Library Babel Fish column, Inside Higher Education, (January 22, 2015): https://www.insidehighered.com/blogs/library-babel-fish/information-literacy-standardsframework-debateSarah Blakeslee, “The CRAAP test,” op. cit. https://commons.emich.edu/loexquarterly/vol31/iss3/4Association of College and Research Libraries, Information Literacy Competency Standards for Higher Education, (2000), https://alair.ala.org/handle/11213/7668  Note: These standards were rescinded in 2016.Mike Caulfield, “A Short History of CRAAP,” Hapgood, (June 14, 2018): https://hapgood.us/2018/09/14/a-short-history-of-craap/Kevin Seeber (March 18, 2017), “Wiretaps and CRAAP,” op. cit.Mike Caulfield, “The Truth is in the Network,” op. cit. Caulfield developed SIFT from earlier version of this heuristic, “four moves and a habit,” described in his 2017 OER book Web Literacy for Student Fact-Checkers, (December 1, 2020) https://webliteracy.pressbooks.comJessica E. Brodsky, Patricia J. Brooks, Donna Scimeca, Ralitsa Todorova, Peter Galati, Michael Batson, Robert Grosso, Michael Matthews, Victor Miller, and Michael Caulfield , “Improving College Students’ Fact-Checking Strategies Through Lateral Reading Instruction in a General Education Civics Course,” Cognitive Research: Principles and Implications, 6(1) (2021), 1-18, https://doi.org/10.1186/s41235-021-00291-4; Joel Breakstone, Mark Smith, Priscilla Connors, Teresa Ortega, Darby Kerr, and Sam Wineburg, “Lateral Reading: College Students Learn to Critically Evaluate Internet Sources in an Online Course,” The Harvard Kennedy School Misinformation Review, 2(1), (2021) 1-17, https://doi.org/10.37016/mr-2020-56Justin Reich, “Tinkering Toward Networked Learning: What Tech Can and Can’t Do for Education” [email interview by Barbara Fister], Project Information Literacy, Smart Talk Interview, no. 33, (December 2020): https://projectinfolit.org/smart-talk-interviews/tinkering-toward-networked-learning-what-tech-can-and-cant-do-for-education/Alison J. Head, John Wihbey, P. Takis Metaxas, Margy MacMillan, and Dan Cohen, How Students Engage with News: Five Takeaways for Educators, Journalists, and Librarians, Project Information Literacy Research Institute, (October 16, 2018), pp. 24-28, https://projectinfolit.org/pubs/news-study/pil_news-study_2018-10-16.pdf Nancy Fried Foster , “The Librarian‐Student‐Faculty Triangle: Conflicting Research Strategies?.” 2010 Library Assessment Conference,(2010): https://urresearch.rochester.edu/researcherFileDownload.action?researcherFileId=71Alison J. Head, Barbara Fister, and Margy MacMillan, Information Literacy in the Age of Algorithms, Project Information Literacy Research Institute, (January 15, 2020):https://projectinfolit.org/publications/algorithm-studyJutta Haider and Olof Sundin (2020), “Information Literacy Challenges in Digital Culture: Conflicting Engagements of Trust and Doubt,” Information, Communication and Society, ahead-of-print, https://doi.org/10.1080/1369118X.2020.1851389Alison J. Head, Barbara Fister, and Margy MacMillan (January 15, 2020), op. cit.Alison J. Head, Barbara Fister, and Margy MacMillan, (January 15, 2020), op. cit., 5-8.See for example, Dave Mass, Aaron Mackey, and Camille Fischer, “The Foilies, 2018,” Electronic Frontier Foundation,(March 11, 2018): https://www.eff.org/deeplinks/2018/03/foilies-2018; Jacob Dubé, “No Escape: The Neverending Online Threats to Female Journalists,” Ryerson Review of Journalism, (May 28, 2018): https://rrj.ca/no-escape-the-neverending-online-threats-to-female-journalists/; Priyanjana Bengani, “As Election Looms, a Network of Mysterious ‘Pink Slime’ Local News Outlets Nearly Triples in Size,” Columbia Journalism Review,(August 4, 2020): https://www.cjr.org/analysis/as-election-looms-a-network-of-mysterious-pink-slime-local-news-outlets-nearly-triples-in-size.php Paulo Freire, “The Banking Model of Education,” In Provenzo, Eugene F. (ed.). Critical Issues in Education: An Anthology of Readings, Sage, (1970), 105-117.Alison J. Head, Barbara Fister, and Margy MacMillan (January 15, 2020), op.cit., 16-19.Alison J. Head, Barbara Fister, and Margy MacMillan (January 15, 2020), op.cit., 22-25.Alison J. Head, John Wihbey, P. Takis Metaxas, Margy MacMillan, and Dan Cohen (October 16, 2018), op.cit., 20Nancy Fried Foster (2010), op. cit.Barbara Fister, “Lizard People in the Libraries,” PIL Provocation Series, No. 1, Project Information Literacy Research Institute,(February 3, 2021):  https://projectinfolit.org/pubs/provocation-series/essays/lizard-people-in-the-library.html Marc Meola , “Chucking the Checklist: A Contextual Approach to Teaching Undergraduates Web-site Evaluation,” portal: Libraries and the Academy 4, no.3 (2004): 331-344, https://doi.org/10.1353/pla.2004.0055, p.338Association of College and Research Libraries , Framework for Information Literacy for Higher Education (2016) http://www.ala.org/acrl/standards/ilframework Open Knowledge Foundation: Welcome Livemark – the New Frictionless Data Tool We are very excited to announce that a new tool has been added to the Frictionless Data toolkit: Livemark. What is Frictionless? Frictionless Data is a set of specifications for data and metadata interoperability, accompanied by a collection of software libraries that implement these specifications, and a range of best practices for data management. The project is funded by the Sloan Foundation and Open Data Institute. Learn more about Frictionless data here. What is Livemark? Livemark is a great tool that allows you to publish data articles very easily, giving you the possibility to see your data live on a working website in a blink of an eye. How does it work? Livemark is a Python library generating a static page that extends Markdown with interactive charts, tables, scripts, and much much more. You can use the Frictionless framework as a frictionless variable to work with your tabular data in Livemark. Livemark offers a series of useful features, like automatically generating a table of contents and providing a scroll-to-top button when you scroll down your document. You can also customise the layout of your newly created webpage. How can you get started? Livemark is very easy to use. We invite you watch this great demo by developer Evgeny Karev: You can also have a look at the documentation on GitHub What do you think? If you create a site using Livemark, please let us know! Frictionless Data is an open source project, therefore we encourage you to give us feedback. Let us know your thoughts, suggestions, or issues by joining us in our community chat on Discord (opens new window) or by opening an issue in the GitHub repo. Samvera: Take a Virtual Tour of Samvera Repositories The Samvera Repository Online Tour provides an overview of a range of Samvera-based digital repositories and their collections at institutions across the US and UK. Click through to explore how Samvera technologies allow institutions and organizations to organize and provide access to diverse materials — from oral histories and historic photographs, to student projects and research data.  The tour was planned by the Samvera Marketing Working Group and created using StoryMapJS by Lafayette College Libraries student workers Grayce Walker, Deja Jackson, and Khaknazar Shyntassov. A huge thanks to them, as well as to Charlotte Nunes at Lafayette College Libraries Digital Scholarship Services for coordinating the work. If you’d like to include your Samvera-based repository on the tour, simply fill out this form. The post Take a Virtual Tour of Samvera Repositories appeared first on Samvera. David Rosenthal: Alternatives To Proof-of-Work The designers of peer-to-peer consensus protocols such as those underlying cryptocurrencies face three distinct problems. They need to prevent:Being swamped by a multitude of Sybil peers under the control of an attacker. This requires making peer participation expensive, such as by Proof-of-Work (PoW). PoW is problematic because it has a catastrophic carbon footprint.A rational majority of peers from conspiring to obtain inappropriate benefits. This is thought to be achieved by decentralization, that is a network of so many peers acting independently that a conspiracy among a majority of them is highly improbable. Decentralization is problematic because in practice all successful cryptocurrencies are effectively centralized.A rational minority of peers from conspiring to obtain inappropriate benefits. This requirement is called incentive compatibility. This is problematic because it requires very careful design of the protocol.In the rather long post below the fold I focus on some potential alternatives to PoW, inspired by Jeremiah Wagstaff's Subspace: A Solution to the Farmer’s Dilemma, the white paper for a new blockchain technology.Careful design of the economic mechanisms of the protocol can in theory ensure incentive compatibility, or as Ittay Eyal and Emin Gun Sirer express it:the best strategy of a rational minority pool is to be honest, and a minority of colluding miners cannot earn disproportionate benefits by deviating from the protocol They showed in 2013 that the Bitcoin protocol was not incentive-compatible, but this is in principle amenable to a technical fix. Unfortunately, ensuring decentralization is a much harder problem.DecentralizationVitalik Buterin, co-founder of Ethereum, wrote in The Meaning of Decentralization:In the case of blockchain protocols, the mathematical and economic reasoning behind the safety of the consensus often relies crucially on the uncoordinated choice model, or the assumption that the game consists of many small actors that make decisions independently. The Internet's basic protocols, TCP/IP, DNS, SMTP, HTTP are all decentralized, and yet the actual Internet is heavily centralized around a few large companies. Centralization is an emergent behavior, driven not by technical but by economic forces. W. Brian Arthur described these forces before the Web took off in his 1994 book Increasing Returns and Path Dependence in the Economy.Similarly, the blockchain protocols are decentralized but ever since 2014 the Bitcoin blockchain has been centralized around 3-4 large mining pools. Buterin wrote:can we really say that the uncoordinated choice model is realistic when 90% of the Bitcoin network’s mining power is well-coordinated enough to show up together at the same conference? This is perhaps the greatest among the multiple failures of Satoshi Nakamoto's goals for Bitcoin. The economic forces driving this centralization are the same as those that centralized other Internet protocols. I explored how they act to centralize P2P systems in 2014's Economies of Scale in Peer-to-Peer Networks. I argued that an incentive-compatible protocol wasn't adequate to prevent centralization. The simplistic version of the argument was:The income to a participant in an incentive-compatible P2P network should be linear in their contribution of resources to the network.The costs a participant incurs by contributing resources to the network will be less than linear in their resource contribution, because of the economies of scale.Thus the proportional profit margin a participant obtains will increase with increasing resource contribution.Thus the effects described in Brian Arthur's Increasing Returns and Path Dependence in the Economy will apply, and the network will be dominated by a few, perhaps just one, large participant.And I wrote:The advantages of P2P networks arise from a diverse network of small, roughly equal resource contributors. Thus it seems that P2P networks which have the characteristics needed to succeed (by being widely adopted) also inevitably carry the seeds of their own failure (by becoming effectively centralized). Bitcoin is an example of this. My description of the fundamental problem was:The network has to arrange not just that the reward grows more slowly than the contribution, but that it grows more slowly than the cost of the contribution to any participant. If there is even one participant whose rewards outpace their costs, Brian Arthur's analysis shows they will end up dominating the network. Herein lies the rub. The network does not know what an individual participant's costs, or even the average participant's costs, are and how they grow as the participant scales up their contribution.So the network would have to err on the safe side, and make rewards grow very slowly with contribution, at least above a certain minimum size. Doing so would mean few if any participants above the minimum contribution, making growth dependent entirely on recruiting new participants. This would be hard because their gains from participation would be limited to the minimum reward. It is clear that mass participation in the Bitcoin network was fuelled by the (unsustainable) prospect of large gains for a small investment. The result of limiting reward growth would be a blockchain with limited expenditure on mining which, as we see with the endemic 51% attacks against alt-coins, would not be secure. But without such limits, economies of scale mean that the blockchain would be dominated by a few large mining pools, so would not be decentralized and would be vulnerable to insider attacks. Note that in June 2014 the GHash.io mining pool alone had more than 51% of the Bitcoin mining power.But the major current problem for Bitcoin, Ethereum and cryptocurrencies in general is not vulnerability to 51% attacks. Participants in these "trustless" systems trust that the mining pools are invested in their security and will not conspire to misbehave. Events have shown that this trust is misplaced as applied to smaller alt-coins. Trustlessness was one of Nakamoto's goals, another of the failures. But as regards the major cryptocurrencies this trust is plausible; everyone is making enough golden eggs to preserve the life of the goose.Alternatives to Proof-of-WorkThe major current problem for cryptocurrencies is that their catastrophic carbon footprint has attracted attention. David Gerard writes:The bit where proof-of-work mining uses a country’s worth of electricity to run the most inefficient payment system in human history is finally coming to public attention, and is probably Bitcoin’s biggest public relations problem. Normal people think of Bitcoin as this dumb nerd money that nerds rip each other off with — but when they hear about proof-of-work, they get angry. Externalities turn out to matter. Yang Xiao et al's A Survey of Distributed Consensus Protocols for Blockchain Networks is very useful. They:identify five core components of a blockchain consensus protocol, namely, block proposal, block validation, information propagation, block finalization, and incentive mechanism. A wide spectrum of blockchain consensus protocols are then carefully reviewed accompanied by algorithmic abstractions and vulnerability analyses. The surveyed consensus protocols are analyzed using the five-component framework and compared with respect to different performance metrics. Their "wide spectrum" is comprehensive as regards the variety of PoW protocols, and as regards the varieties of Proof-of-Stake (PoS) protocols that are the leading alternatives to PoW. Their coverage of other consensus protocols is less thorough, and as regards the various protocols that defend against Sybil attacks by wasting storage instead of computation it is minimal.The main approach to replacing PoW with something equally good at preventing Sybil attacks but less good at cooking the planet has been PoS, but a recent entrant using Proof-of-Time-and-Space (I'll use PoTaS since the acronyms others use are confusing) to waste storage has attracted considerable attention. I will discuss PoS in general terms and two specific systems, Chia (PoTaS) and Subspace (a hybrid of PoTaS and PoS).Proof-of-StakeIn PoW as implemented by Nakamoto, the probability of a winning the next block is proportional to the number of otherwise useless hashes computed — Nakamoto thought by individual CPUs but now by giant mining pools driven by warehouses full of mining ASICs. The idea of PoS is that the resource being wasted to deter Sybil attacks is the cryptocurrency itself. In order to mount a 51% attack the attacker would have to control more of the cryptocurrency that the loyal peers. In vanilla PoS the probability of winning the next block is proportional to the amount of the cryptocurrency "staked", i.e. effectively escrowed and placed at risk of being "slashed" if the majority concludes that the peer has misbehaved. It appears to have been first proposed in 2011 by Bitcointalk user QuantumMechanic.The first cryptocurrency to use PoS, albeit as a hybrid with PoW, was Peercoin in 2012. There have been a number of pure PoS cryptocurrencies since, including Cardano from 2015 and Algorand from 2017 but none have been very successful.Ethereum, the second most important cryptocurrency, understood the need to replace PoW in 2013 and started work in 2014. But as Vitalik Buterin then wrote:Over the last few months we have become more and more convinced that some inclusion of proof of stake is a necessary component for long-term sustainability; however, actually implementing a proof of stake algorithm that is effective is proving to be surprisingly complex.The fact that Ethereum includes a Turing-complete contracting system complicates things further, as it makes certain kinds of collusion much easier without requiring trust, and creates a large pool of stake in the hands of decentralized entities that have the incentive to vote with the stake to collect rewards, but which are too stupid to tell good blockchains from bad. Buterin was right about making "certain kinds of collusion much easier without requiring trust". In On-Chain Vote Buying and the Rise of Dark DAOs Philip Daian and co-authors show that "smart contracts" provide for untraceable on-chain collusion in which the parties are mutually pseudonymous. It is obviously much harder to prevent bad behavior in a Turing-complete environment. Seven years later Ethereum is still working on the transition, which they currently don't expect to be complete for another 18 months:Shocked to see that the timeline for Ethereum moving to ETH2 and getting off proof-of-work mining has been put back to late 2022 … about 18 months from now. This is mostly from delays in getting sharding to work properly. Vitalik Buterin says that this is because the Ethereum team isn’t working well together. [Tokenist] Skepticism about the schedule for ETH2 is well-warranted, as Julia Magas writes in When will Ethereum 2.0 fully launch? Roadmap promises speed, but history says otherwise:Looking at how fast the relevant updates were implemented in the previous versions of Ethereum roadmaps, it turns out that the planned and real release dates are about a year apart, at the very minimum. Are there other reasons why PoS is so hard to implement safely? Bram Cohen's talk at Stanford included a critique of PoS:Its threat model is weaker than Proof of Work.Just as Proof of Work is in practice centralized around large mining pools, Proof of Stake is centralized around large currency holdings (which were probably acquired much more cheaply than large mining installations).The choice of a quorum size is problematic. "Too small and it's attackable. Too large and nothing happens." And "Unfortunately, those values are likely to be on the wrong side of each other in practice."Incentivizing peers to put their holdings at stake creates a class of attacks in which peers "exaggerate one's own bonding and blocking it from others."Slashing introduces a class of attacks in which peers cause others to be fraudulently slashed.The incentives need to be strong enough to overcome the risks of slashing, and of keeping their signing keys accessible and thus at risk of compromise."Defending against those attacks can lead to situations where the system gets wedged because a split happened and nobody wants to take one for the team"Cohen seriously under-played PoS's centralization problem. It isn't just that the Gini coefficients of cryptocurrencies are extremely high, but that this is a self-reinforcing problem. Because the rewards for mining new blocks, and the fees for including transactions in blocks, flow to the HODL-ers in proportion to their HODL-ings, whatever Gini coefficient the systems starts out with will always increase. As I wrote, cryptocurrencies are:a mechanism for transferring wealth from later adopters, called suckers, to early adopters, called geniuses. PoS makes this "ratchet" mechanism much stronger than PoW, and thus renders them much more vulnerable to insider 51% attacks. I discussed one such high-profile attack by Justin Sun of Tron on the Steemit blockchain in Proof-of-Stake In Practice :One week later, on March 2nd, Tron arranged for exchanges, including Huobi, Binance and Poloniex, to stake tokens they held on behalf of their customers in a 51% attack:According to the list of accounts powered up on March. 2, the three exchanges collectively put in over 42 million STEEM Power (SP).With an overwhelming amount of stake, the Steemit team was then able to unilaterally implement hard fork 22.5 to regain their stake and vote out all top 20 community witnesses – server operators responsible for block production – using account @dev365 as a proxy. In the current list of Steem witnesses, Steemit and TRON’s own witnesses took up the first 20 slots. Although this attack didn't provide Tron with an immediate monetary reward, the long term value of retaining effective control of the blockchain was vastly greater than the cost of staking the tokens. I've been pointing out that the high Gini coefficients of cryptocurrencies means Proof-of-Stake centralizes control of the blockchain in the hands of the whales since 2017's Why Decentralize? quoted Vitalik Buterin pointing out that a realistic scenario was:In a proof of stake blockchain, 70% of the coins at stake are held at one exchange. Or in this case three exchanges cooperating. Note that economic analyses of PoS, such as More (or less) economic limits of the blockchain by Joshua Gans and Neil Gandal, assume economically rational actors care about the iliquidity of staked coins and the foregone interest. But true believers in "number go up" have a long-term perspective similar to Sun's. The eventual progress of their coin "to the moon!" means that temporary, short-term costs are irrelevant to long-term HODL-ers.Jude C. Nelson amplifies the centralization point:PoW is open-membership, because the means of coin production are not tied to owning coins already. All you need to contribute is computing power, and you can start earning coins at a profit.PoS is closed-membership with a veneer of open-membership, because the means of coin production are tied to owning a coin already. What this means in practice is that no rational coin-owner is going to sell you coins at a fast enough rate that you'll be able to increase your means of coin production. Put another way, the price you'd pay for the increased means of coin production will meet or exceed the total expected revenue created by staking those coins over their lifetime. So unless you know something the seller doesn't, you won't be able to profit by buying your way into staking.Overall, this makes PoS less resilient and less egalitarian than PoW. While both require an up-front capital expenditure, the expenditure for PoS coin-production will meet or exceed the total expected revenue of those coins at the point of sale. So, the system is only as resilient as the nodes run by the people who bought in initially, and the only way to join later is to buy coins from people who want to exit (which would only be viable if these folks believed the coins are worth less than what you're buying them for, which doesn't bode well for you as the buyer). Nelson continues:PoW requires less proactive trust and coordination between community members than PoS -- and thus is better able to recover from both liveness and safety failures -- precisely because it both (1) provides a computational method for ranking fork quality, and (2) allows anyone to participate in producing a fork at any time. If the canonical chain is 51%-attacked, and the attack eventually subsides, then the canonical chain can eventually be re-established in-band by honest miners simply continuing to work on the non-attacker chain. In PoS, block-producers have no such protocol -- such a protocol cannot exist because to the rest of the network, it looks like the honest nodes have been slashed for being dishonest. Any recovery procedure necessarily includes block-producers having to go around and convince people out-of-band that they were totally not dishonest, and were slashed due to a "hack" (and, since there's lots of money on the line, who knows if they're being honest about this?). PoS conforms to Mark 4:25:For he that hath, to him shall be given: and he that hath not, from him shall be taken even that which he hath. In Section VI(E) Yang Xiao et al identify the following types of vulnerability in PoS systems:Costless simulation:literally means any player can simulate any segment of blockchain history at the cost of no real work but speculation, as PoS does not incur intensive computation while the blockchain records all staking history. This may give attackers shortcuts to fabricate an alternative blockchain. It is the basis for attacks 2 through 5. Nothing at stakeUnlike a PoW miner, a PoS minter needs little extra effort to validate transactions and generate blocks on multiple competing chains simultaneously. This “multi-bet” strategy makes economical sense to PoS nodes because by doing so they can avoid the opportunity cost of sticking to any single chain. Consequently if a significantly fraction of nodes perform the “multi-bet” strategy, an attacker holding far less than 50% of tokens can mount a successful double spending attack. The defense against this attack is usually "slashing", forfeiting the stake of miners detected on multiple competing chains. But slashing, as Cohen and Nelson point out, is in itself a consensus problem. Posterior corruptionThe key enabler of posterior corruption is the public availability of staking history on the blockchain, which includes stakeholder addresses and staking amounts. An attacker can attempt to corrupt the stakeholders who once possessed substantial stakes but little at present by promising them rewards after growing an alternative chain with altered transaction history (we call it a “malicious chain”). When there are enough stakeholders corrupted, the colluding group (attacker and corrupted once-rich stakeholders) could own a significant portion of tokens (possibly more than 50%) at some point in history, from which they are able to grow an malicious chain that will eventually surpass the current main chain. The defense is key-evolving cryptography, which ensures that the past signatures cannot be forged by the future private keys. Long-range attack as introduced by Buterin:foresees that a small group of colluding attackers can regrow a longer valid chain that starts not long after the genesis block. Because there were likely only a few stakeholders and a lack of competition at the nascent stage of the blockchain, the attackers can grow the malicious chain very fast and redo all the PoS blocks (i.e. by costless simulation) while claiming all the historical block rewards. Evangelos Deirmentzoglou et al's A Survey on Long-Range Attacks for Proof of Stake Protocols provides a useful review of these attacks. Even if there are no block rewards, only fees, a variant long-range attack is possible as described in Stake-Bleeding Attacks on Proof-of-Stake Blockchains by Peter Gazi et al, and by Shijie Zhang and Jong-Hyouk Lee in Eclipse-based Stake-Bleeding Attacks in PoS Blockchain Systems.Stake-grinding attackunlike PoW in which pseudo-randomness is guaranteed by the brute-force use of a cryptographic hash function, PoS’s pseudo-randomness is influenced by extra blockchain information—the staking history. Malicious PoS minters may take advantage of costless simulation and other staking-related mechanisms to bias the randomness of PoS in their own favor, thus achieving higher winning probabilities compared to their stake amounts Centralization risk as discussed above:In PoS the minters can lawfully reinvest their profits into staking perpetually, which allows the one with a large sum of unused tokens become wealthier and eventually reach a monopoly status. When a player owns more than 50% of tokens in circulation, the consensus process will be dominated by this player and the system integrity will not be guaranteed. There are a number of papers on this problem, including Staking Pool Centralization in Proof-of-Stake Blockchain Network by Ping He et al, Compounding of wealth in proof-of-stake cryptocurrencies by Giulia Fanti et al, and Stake shift in major cryptocurrencies: An empirical study by Rainer Stütz et al. But to my mind none of them suggest a realistic mitigation.These are not the only problems from which PoS suffers. Two more are:Checkpointing. Long-range and related attacks are capable of rewriting almost the entire chain. To mitigate this, PoS systems can arrange for consensus on checkpoints, blocks which are subsequently regarded as canonical forcing any rewriting to start no earlier than the following block. Winkle – Decentralised Checkpointing for Proof-of-Stake is:a decentralised checkpointing mechanism operated by coin holders, whose keys are harder to compromise than validators’ as they are more numerous. By analogy, in Bitcoin, taking control of one-third of the total supply of money would require at least 889 keys, whereas only 4 mining pools control more than half of the hash power It is important that consensus on checkpoints is achieved through a different mechanism than consensus on blocks. To over-simplify, Winkle piggy-backs votes for checkpoints on transactions; a transaction votes for a block with the number of coins remaining in the sending account, and with the number sent to the receiving account. A checkpoint is final once a set proportion of the coins have voted for it. For the details, see Winkle: Foiling Long-Range Attacks in Proof-of-Stake Systems by Sarah Azouvi et al. Lending. In Competitive equilibria between staking and on-chain lending, Tarun Chitra demonstrates that it is:possible for on-chain lending smart contracts to cannibalize network security in PoS systems. When the yield provided by these contracts is more attractive than the inflation rate provided from staking, stakers will tend to remove their staked tokens and lend them out, thus reducing network security. ... Our results illustrate that rational, non-adversarial actors can dramatically reduce PoS network security if block rewards are not calibrated appropriately above the expected yields of on-chain lending. I believe this is part of a fundamental problem for PoS. The token used to prevent a single attacker appearing as a multitude of independent peers can be lent, and thus the attacker can borrow a temporary majority of the stake cheaply, for only a short-term interest payment. Preventing this increases implementation complexity significantly.In summary, despite PoS' potential for greatly reducing PoW's environmental impact and cost of defending against Sybil attacks, it has a major disadvantage. It is significantly more complex and thus its attack surface is much larger, especially when combined with a Turing-complete execution environment such as Ethereum's. It therefore needs more defense mechanisms, which increase complexity further. Buterin and the Ethereum developers realize the complexity of the implementation task they face, which is why their responsible approach is taking so long. Currently Ethereum is the only realistic candidate to displace Bitcoin, and thus reduce cryptocurrencies' carbon footprint, so the difficulty of an industrial-strength implementation of PoS for Ethereum 2.0 is a major problem.Proof-of-Space-and-TimeBack in 2018 I wrote about Bram Cohen's PoTaS system, Chia, in Proofs of Space and Chia Network. Instead of wasting computation to prevent Sybil attacks, Chia wastes storage. Chia's "space farmers" create and store "plots" consisting of large amounts of otherwise useless data. The technical details are described in Chia Consensus. They are comprehensive and impressively well thought out.Because, like Bitcoin, Chia is wasting a real resource to defend against Sybil attacks it lacks many of PoS' vulnerabilities. Nevertheless, the Chia protocol is significantly more complex than Bitcoin and thus likely to possess additional vulnerabilities. For example, whereas in Bitcoin there is only one role for participants, mining, the Chia protocol involves three roles:Farmer, "Farmers are nodes which participate in the consensus algorithm by storing plots and checking them for proofs of space."Timelord, "Timelords are nodes which participate in the consensus algorithm by creating proofs of time".Full node, which involves "broadcasting proofs of space and time, creating blocks, maintaining a mempool of pending transactions, storing the historical blockchain, and uploading blocks to other full nodes as well as wallets (light clients)."Figure 11Another added complexity is that the Chia protocol maintains three chains (Challenge, Reward and Foliage), plus an evanescent chain during each "slot" (think Bitcoin's block time), as shown in the document's Figure 11. The document therefore includes a range of attacks and their mitigations which are of considerable technical interest.Cohen's praiseworthy objective for Chia was to avoid the massive power waste of PoW because:"You have this thing where mass storage medium you can set a bit and leave it there until the end of time and its not costing you any more power. DRAM is costing you power when its just sitting there doing nothing". Alas, Cohen was exaggerating:A state-of-the-art disk drive, such as Seagate's 12TB BarraCuda Pro, consumes about 1W spun-down in standby mode, about 5W spun-up idle and about 9W doing random 4K reads. Which is what it would be doing much of the time while "space farming". Clearly, PoTaS uses energy, just much less than PoW. Reporting on Cohen's 2018 talk at Stanford I summarized:Cohen's vision is of a PoSp/VDF network comprising large numbers of desktop PCs, continuously connected and powered up, each with one, or at most a few, half-empty hard drives. The drives would have been purchased at retail a few years ago. My main criticism in those posts was Cohen's naiveté about storage technology, the storage market and economies of scale:There would appear to be three possible kinds of participants in a pool:Individuals using the spare space in their desktop PC's disk. The storage for the Proof of Space is effectively "free", but unless these miners joined pools, they would be unlikely to get a reward in the life of the disk. Individuals buying systems with CPU, RAM and disk solely for mining. The disruption to the user's experience is gone, but now the whole cost of mining has to be covered by the rewards. To smooth out their income, these miners would join pools. Investors in data-center scale mining pools. Economies of scale would mean that these participants would see better profits for less hassle than the individuals buying systems, so these investor pools would come to dominate the network, replicating the Bitcoin pool centralization. Thus if Chia's network were to become successful, mining would be dominated by a few large pools. Each pool would run a VDF server to which the pool's participants would submit their Proofs of Space, so that the pool manager could verify their contribution to the pool.The emergence of pools, and dominance of a small number of pools, has nothing to do with the particular consensus mechanism in use. Thus I am skeptical that alternatives to Proof of Work will significantly reduce centralization of mining in blockchains generally, and in Chia Network's blockchain specifically. As I was writing the first of these posts, TechCrunch reported:Chia has just raised a $3.395 million seed round led by AngelList’s Naval Ravikant and joined by Andreessen Horowitz, Greylock and more. The money will help the startup build out its Chia coin and blockchain powered by proofs of space and time instead of Bitcoin’s energy-sucking proofs of work, which it plans to launch in Q1 2019. Even in 2020 the naiveté persisted, as Chia pitched the idea that space farming on a Raspberry Pi was a way to make money. It still persists, as Chia's president reportedly claims that "recyclable hard drives are entering the marketplace". But when Chia Coin actually started trading in early May 2021 the reality was nothing like Cohen's 2018 vision:As everyone predicted, the immediate effect was to create a massive shortage of the SSDs needed to create plots, and the hard drives needed to store them. Even Gene Hoffman, Chia's CEO, admitted that Bitcoin rival Chia 'destroyed' hard disc supply chains, says its boss:Chia, a cryptocurrency intended to be a “green” alternative to bitcoin has instead caused a global shortage of hard discs. Gene Hoffman, the president of Chia Network, the company behind the currency, admits that “we’ve kind of destroyed the short-term supply chain”, but he denies it will become an environmental drain. The result of the spike in storage prices was a rise in the vendors stock:The share price of hard disc maker Western Digital has increased from $52 at the start of the year to $73, while competitor Seagate is up from $60 to $94 over the same period. To give you some idea of how rapidly Chia has consumed storage in the two months since launch, it is around 20% of the rate at which the entire industry produced hard disk in 2018. Chia Pools Mining pools arose. As I write the network is storing 30.06EB of otherwise useless data, of which one pool, ihpool.com is managing 10.78EB, or 39.3%. Unlike Bitcoin, the next two pools are much smaller, but large enough so that the top four pools have 42% of the space. The network is slightly more decentralized than Bitcoin has been since 2014, and for reasons discussed below is less vulnerable to an insider 51% attack. Chia "price" The "price" of Chia Coin collapsed, from $1934.51 at the start of trading to $165.41 Sunday before soaring to $185.78 as I write. Each circulating XCH corresponds to about 30TB. The investment in "space farming" hardware vastly outweighs, by nearly six times, the market cap of the cryptocurrency it is supporting. The "space farmers" are earning $1.69M/day, or about $20/TB/year. A 10TB internal drive is currently about $300 on Amazon, so it will be about a 18 months before it earns a profit. The drive is only warranted for 3 years. But note that the warranty is limited:Supports up to 180 TB/yr workload rate Workload Rate is defined as the amount of user data transferred to or from the hard drive. Using the drive for "space farming" would likely void the warranty and, just as PoW does to GPUs, burn out the drive long before its warranted life. If you have two years, the $300 investment theoretically earns a 25% return before power and other costs. But the hard drive isn't the only cost of space farming. In order to become a "space farmer" in the first place you need to create plots containing many gigabytes of otherwise useless cryptographically-generated data. You need lots of them; the probability of winning your share of the $2.74M/day is how big a fraction of the nearly 30EB you can generate and store. The 30EB is growing rapidly, so the quicker you can generate the plots, the better your chance in the near term. To do so in finite time you need in addition to the hard drive a large SSD at extra cost. Using it for plotting will void its warranty and burn it out in as little as six weeks. And you need a powerful server running flat-out to do the cryptography, which both rather casts doubt on how much less power than PoW Chia really uses, and increases the payback time significantly. In my first Chia post I predicted that "space farming" would be dominated by huge data centers such as Amazon's. Sure enough, Wolfie Zhao reported on May 7th that:Technology giant Amazon has rolled out a solution dedicated to Chia crypto mining on its AWS cloud computing platform.According to a campaign page on the Amazon AWS Chinese site, the platform touts that users can deploy a cloud-based storage system in as quickly as five minutes in order to mine XCH, the native cryptocurrency on the Chia network. Two weeks later David Gerard reported that:The page disappeared in short order — but an archive exists. Because Chia mining trashes the drives, something else I pointed out in my first Chia post, storage services are banning users who think that renting something is a license to destroy it. In any case, 10TB of Amazon's S3 Reduced Redundancy Storage costs $0.788/day, so it would be hard to make ends meet. Cheaper storage services, such as Wasabi at $0.20/day are at considerable risk from Chia. Although this isn't an immediate effect, as David Gerard writes, because creating Chia plots wears out SSDs, and Chia farming wears out hard disks:Chia produces vast quantities of e-waste—rare metals, assembled into expensive computing components, turned into toxic near-unrecyclable landfill within weeks. Miners are incentivized to join pools because they prefer a relatively predictable, frequent flow of small rewards to very infrequent large rewards. The way pools work in Bitcoin and related protocols is that the pool decides what transactions are in the block it hopes to mine, and gets all the pool participants to work on that block. Thus a pool, or a conspiracy among pools, that had 51% of the mining power would have effective control over the transactions that were finalized. Because they make the decision as to which transactions happen, Nicholas Weaver argues that mining pools are money transmitters and thus subject to the AML/KYC rules. But in Chia pools work differently:First and foremost, even when a winning farmer is using a pool, they themselves are the ones who make the transaction block - not the pool. The decentralization benefits of this policy are obvious. The potential future downside is that while Bitcoin miners in a pool can argue that AML/KYC is the responsibility of the pool, Chia farmers would be responsible for enforcing the AML/KYC rules and subject to bank-sized penalties for failing to do so.In Bitcoin the winning pool receives and distributes both the block reward and the (currently much smaller) transaction fees. Over time the Bitcoin block reward is due to go to zero and the system is intended to survive on fees alone. Alas, research has shown that a fee-only Bitcoin system is insecure.Chia does things differently in two ways. First:all the transaction fees generated by a block go to the farmer who found it and not to the pool.Trying to split the transaction fees with the pool could result in transaction fees being paid ‘under the table’ either by making them go directly to the farmer or making an anyone can spend output which the farmer would then pay to themselves. Circumventing the pool would take up space on the blockchain. It could also encourage the emergence of alternative pooling protocols where the pool makes the transaction block which is a form of centralization we wish to avoid. The basic argument is that in Bitcoin the 51% conspiracy is N pools where in Chia it is M farmers (M ≫ N). Chia are confident that this is safe:This ensures that even if a pool has 51% netspace, they would also need to control ALL of the farmer nodes (with the 51% netspace) to do any malicious activity. This will be very difficult unless ALL the farmers (with the 51% netspace) downloaded the same malicious Chia client programmed by a Bram like level genius. I'm a bit less confident because, like Ethereum, Chia has a Turing-complete programming environment. In On-Chain Vote Buying and the Rise of Dark DAOs Philip Daian and co-authors showed that "smart contracts" provide for untraceable on-chain collusion in which the parties are mutually pseudonymous. Although their conspriacies were much smaller, similar techniques might be the basis for larger attacks on blockchains with "smart contracts".Second:This method has the downside of reducing the smoothing benefits of pools if transaction fees come to dominate fixed block rewards. That’s never been a major issue in Bitcoin and our block reward schedule is set to only halve three times and continue at a fixed amount forever after. There will alway be block rewards to pay to the pool while transaction fees go to the individual farmers. So unlike the Austrian economics of Bitcoin, Chia plans to reward farming by inflating the currency indefinitely, never depending wholly on fees. In Bitcoin the pool takes the whole block reward, but the way block rewards work is different too:fixed block rewards are set to go 7/8 to the pool and 1/8 to the farmer. This seems to be a sweet spot where it doesn’t reduce smoothing all that much but also wipes out potential selfish mining attacks where someone joins a competing pool and takes their partials but doesn’t upload actual blocks when they find them. Those sort of attacks can become profitable when the fraction of the split is smaller than the size of the pool relative to the whole system. Last I checked ihpool.com had almost 40% of the total system.Rational economics are not in play here. "Space farming" makes sense only at scale or for the most dedicated believers in "number go up". Others are less than happy:So I tested this Chia thing overnight. Gave it 200GB plot and two CPU threads. After 10 hours it consumed 400GB temp space, didn’t sync yet, CPU usage is always 80%+. Estimated reward time is 5 months. This isn’t green, already being centralised on large waste producing servers. The problem for the "number go up" believers is that the "size go up" too, by about half-an-exabyte a day. As the network grows, the chance that your investment in hardware will earn a reward goes down because it represents a smaller proportion of the total. Unless "number go up" much faster than "size go up", your investment is depreciating rapidly not just because you are burning it out but because its cost-effectiveness is decaying. And as we see, "size go up" rapidly but "number go down" rapidly. And economies of scale mean that return on investment in hardware will go up significantly with the proportion of the total the farmer has. So the little guy gets the short end of the stick even if they are in a pool.Chia's technology is extremely clever, but the economics of the system that results in the real world don't pass the laugh test. Chia is using nearly a billion dollars of equipment being paid for by inflating the currency at a rate of currently 2/3 billion dollars a year to process transactions at a rate around five billion dollars a year, a task that could probably be done using a conventional database and a Raspberry Pi. The only reason for this profligacy is to be able to claim that it is "decentralized". It is more decentralized than PoW or PoS systems, but over time economies of scale and free entry will drive the reward for farming in fiat terms down and mean that small-scale farmers will be squeezed out.The Chia "price" chart suggests that it might have been a "list-and-dump" scheme, in which A16Z and the other VCs incentivized the miners to mine and the exchanges to list the new cryptocurrency so that the VCs could dump their HODL-ings on the muppets seduced by the hype and escape with a profit. Note that A16Z just raised a $2.2B fund dedicated to pouring money into similar schemes. This is enough to fund 650 Chia-sized ventures! (David Gerard aptly calls Andreesen Horowitz "the SoftBank of crypto") They wouldn't do that unless they were making big bucks from at least some of the ones they funded earlier. Chia's sensitivity about their PR led them to hurl bogus legal threats at the leading Chia community blog. Neither is a good look. SubspaceAs we see, the Chia network has one huge pool and a number of relatively miniscule pools. In Subspace" A Solution to the Farmer's Dilemma, Wagstaff describes the "farmer's dilemma" thus:Observe that in any PoC blockchain a farmer is, by-definition, incentivized to allocate as much of its scarce storage resources as possible towards consensus. Contrast this with the desire for all full nodes to reserve storage for maintaining both the current state and history of the blockchain. These competing requirements pose a challenge to farmers: do they adhere to the desired behavior, retaining the state and history, or do they seek to maximize their own rewards, instead dedicating all available space towards consensus? When faced with this farmer’s dilemma rational farmers will always choose the latter, effectively becoming light clients, while degrading both the security and decentralization of the network. This implies that any PoC blockchain would eventually consolidate into a single large farming pool, with even greater speed than has been previously observed with PoW and PoS chains. Subspace proposes to resolve this using a hybrid of PoS and PoTaS:We instead clearly distinguish between a permissionless farming mechanism for block production and permissioned staking mechanism for block finalization. Wagstaff describes it thus:To prevent farmers from discarding the history, we construct a novel PoC consensus protocol based on proofs-of-storage of the history of the blockchain itself, in which each farmer stores as many provably-unique replicas of the chain history as their disk space allows.To ensure the history remains available, farmers form a decentralized storage network, which allows the history to remain fully-recoverable, load-balanced, and efficiently-retrievable.To relieve farmers of the burden of maintaining the state and preforming [sic] redundant computation, we apply the classic technique in distributed systems of decoupling consensus and computation. Farmers are then solely responsible for the ordering of transactions, while a separate class of executor nodes maintain the state and compute the transitions for each new block.To ensure executors remain accountable for their actions, we employ a system of staked deposits, verifiable computation, and non-interactive fraud proofs.Separating consensus (PoTaS) and computation (PoS) has interesting effects:Like Chia, the only function of pools is to smooth out farmer's rewards. They do not compose the blocks. Pools will compete on their fees. Economics of scale mean that the larger the pool, the lower the fees it can charge. So, just like Chia, Subspace will end up with one, or only a few, large pools.Like Chia, if they can find a proof, farmers assemble transactions into a block which they can submit to executors for finalization. Subspace shares with Chia the property that a 51% attack requires M farmers not N pools (M ≫ N), assuming of course no supply chain attack or abuse of "smart contracts".Subspace uses a LOCKSS-like technique of electing a random subset of executors for each finalization. Because any participant can unambiguously detect fraudulent execution, and thus that the finalization of a block is fraudulent, the opportunity for bad behavior by executors is highly constrained. A conspiracy of executors has to hope that no honest executor is elected.Like Chia, the technology is extremely clever but there are interesting economic aspects. As regards farmers, Wagstaff writes:To ensure the history does not grow beyond total network storage capacity, we modify the transaction fee mechanism such that it dynamically adjusts in response to the replication factor. Recall that in Bitcoin, the base fee rate is a function of the size of the transaction in bytes, not the amount of BTC being transferred. We extend this equation by including a multiplier, derived from the replication factor. This establishes a mandatory minimum fee for each transaction, which reflects its perpetual storage cost. The multiplier is recalculated each epoch, from the estimated network storage and the current size of the history. The higher the replication factor, the cheaper the cost of storage per byte. As the replication factor approaches one, the cost of storage asymptotically approaches infinity. As the replication factor decreases, transaction fees will rise, making farming more profitable, and in-turn attracting more capacity to the network. This allows the cost of storage to reach an equilibrium price as a function of the supply of, and demand for, space. There are some issues here:The assumption that the market for fees can determine the "perpetual storage cost" is problematic. As I first showed back in 2011, the endowment needed for "perpetual storage" depends very strongly on two factors that are inherently unpredictable, the future rate of decrease of media cost in $/byte (Kryder rate), and the future interest rate. The invisible hand of the market for transaction fees cannot know these, it only knows the current cost of storage. Nor can Subspace management know them, to set the "mandatory minimum fee". Thus it is likely that fees will significantly under-estimate the "perpetual storage cost", leading to problems down the road.The assumption that those wishing to transact will be prepared to pay at least the "mandatory minimum fee" is suspect. Cryptocurrency fees are notoriously volatile because they are based on a blind auction; when no-one wants to transact a "mandatory minimum fee" would be a deterrent, when everyone wants to fees are unaffordable. Research has shown that if fees dominate block rewards systems become unstable.Wagstaff's paper doesn't seem to describe how block rewards work; I assume that they go to the individual farmer or are shared via a pool for smoother cash flow. I couldn't see from the paper whether, like Chia, Subspace intends to avoid depending upon fees.As regards executors:For each new block, a small constant number of executors are chosen through a stake-weighted election. Anyone may participate in execution by syncing the state and placing a small deposit. But the chance that they will be elected and gain the reward for finalizing a block and generating an Execution Receipt (ER) depends upon how much they stake. The mechanism for rewarding executors is:Farmers split transaction fee rewards evenly with all executors, based on the expected number of ERs for each block.7 For example, if 32 executors are elected, the farmer will take half of the all transaction fees, while each executor will take 1/64. A farmer is incentivized to include all ERs which finalize execution for its parent block because doing so will allow it to claim more of its share of the rewards for its own block. For example, if the farmer only includes 16 out of 32 expected ERs, it will instead receive 1/4 (not 1/2) of total rewards, while each of the 16 executors will still receive 1/64. Any remaining shares will then be escrowed within a treasury account under the control of the community of token holders, with the aim of incentivizing continued protocol development. Although the role of executor demands significant resources, both in hardware and in staked coins, these rewards seem inadequate. Every executor has to execute the state transitions in every block. But for each block only a small fraction of the executors receive only, in the example above, 1/64 of the fees. Note also footnote 7:7 We use this rate for explanatory purposes, while noting that in order to minimize the plutocratic nature of PoS, executor shares should be smaller in practice. So Wagstaff expects that an executor will receive only a small fraction of a small fraction of 1/64 of the transaction fees. Even supposing the stake distribution among executors was small and even, unlikely in practice, for the random election mechanism to be effective there need to be many times 32 executors. For example, if there are 256 executors, and executors share 1/8 of the fees, each can expect around 0.005% of the fees. Bitcoin currently runs with fees less than 10% of the block rewards. If Subspace had the same split in my example executors as a class would expect around 1.2% of the block rewards, with farmers as a class receiving 100% of the block rewards plus 87.5% of the fees.There is another problem — the notorious volatility of transaction fees set against the constant cost of running an executor. Much of the time there would be relatively low demand for transactions, so a block would contain relatively few transactions that each offered the mandatory minimum fee. Unless the fees, and especially the mandatory minimum fee, are large relative to the block reward it isn't clear why executors would participate. But fees that large would risk the instability of fee-only blockchains.There are two other roles in Subspace, verifiers and full nodes. As regards incentivizing verifiers:we rely on the fact that all executors may act as verifiers at negligible additional cost, as they are already required to maintain the valid state transitions in order to propose new ERs. If we further require them to reveal fraud in order to protect their own stake and claim their share of the rewards, in the event that they themselves are elected, then we can provide a more natural solution to the verifier’s dilemma. As regards incentivizing full nodes, Wagstaff isn't clear.In addition to executors, any full node may also monitor the network and generate fraud proofs, by virtue of the fact that no deposit is required to act as verifier. As I read the paper, full nodes have similar hardware requirements as executors but no income stream to support them unless they are executors too.Overall, Subspace is interesting. But the advantage from a farmer's point of view of Subspace over Chia is that their whole storage resource is devoted to farming. Everything else is not really significant, and all this would be dominated by a fairly small difference in "price". Add to that the fact that Chia has already occupied the market niche for new PoTaS systems, and has high visibility via Bram Cohen and A16Z, and the prospects for Subspace don't look good. If Subspace succeeds, economies of scale will have two effects:Large pools will dominate small pools because they can charge smaller fees.Large farmers will dominate small farmers because their rewards are linear in the resource they commit, but their costs are sub-linear, so their profit is super-linear. This will likely result in the most profitable, hassle-free way for smaller consumers to participate being investing in a pool rather than actually farming.ConclusionThe overall theme is that permissionless blockchains have to make participating in consensus expensive in some way to defend against Sybils. Thus if you are expending an expensive resource economies of scale are an unavoidable part of Sybil defense. If you want to be "decentralized" to avoid 51% attacks from insiders you have to have some really powerful mechanism pushing back against economies of scale. I see three possibilities, either the blockchain protocol designers:Don't understand why successful cryptocurrencies are centralized, so don't understand the need to push back on economies of scale.Do understand the need to push back on economies of scale but can't figure out how to do it. It is true that figuring this out is incredibly difficult, but their response should be to say "if the blockchain is going to end up centralized, why bother wasting resources trying to be permissionless?" not to implement something they claim is decentralized when they know it won't be.Don't care about decentralization, they just want to get rich quick, and are betting it will centralize around them.In most cases, my money is on #3. At least both Chia and Subspace have made efforts to defuse the worst aspects of centralization. Open Knowledge Foundation: Open Data Day 2021 – read the Report = = = = = = = We are really pleased to share with you our Report on Open Data Day 2021. = = = = = = = We wrote this report for the Open Data Day 2021 funding partners – Microsoft, UK Foreign, Commonwealth and Development Office, Mapbox, Global Facility for Disaster Reduction and Recovery, Latin American Open Data Initiative, Open Contracting Partnership and Datopian. But we also decided to publish it here so that everyone interested in Open Data Day can learn more about the Open Data Day mini-grant scheme – and the impact of joining Open Data Day 2022 as a funding partner. = = = = = = = Highlights from the report include: a list of the 36 countries that received mini-grants in 2021 a breakdown of the 56 mini-grants by World Bank region. It’s notable that most of the Open Data Day 2021 mini-grants were distributed to Sub-Saharan Africa and Latin America & the Caribbean. No mini-grants were sent to North America or the Middle East & North Africa. If you would like to help us reach these two regions in Open Data Day 2022 – please do email us at opendataday@okfn.org a chart showing 82% of mini-grants went to Lower, Lower Middle or Upper Middle income countries (by World Bank lending group). We think this is probably about the right kind of distribution. eleven case studies demonstrating the impact of the Open Data Day mini-grant programme. = = = = = = = To find out more, you can download the report here Please do email us at opendataday@okfn.org if you would like a high resolution copy. = = = = = = = If you would like to learn more about Open Data Day, please visit www.opendataday.org, join the Open Data Day forum or visit the Open Knowledge Foundation blog, where we regularly post articles about Open Data Day. David Rosenthal: Graphing China's Cryptocurrency Crackdown Below the fold an update to last Thursday's China's Cryptocurrency Crackdown with more recent graphs.McKenzie Sigalos reports that Bitcoin mining is now easier and more profitable as algorithm adjusts after China crackdown:China had long been the epicenter of bitcoin miners, with past estimates indicating that 65% to 75% of the world's bitcoin mining happened there, but a government-led crackdown has effectively banished the country's crypto miners."For the first time in the bitcoin network's history, we have a complete shutdown of mining in a targeted geographic region that affected more than 50% of the network," said Darin Feinstein, founder of Blockcap and Core Scientific.More than 50% of the hashrate – the collective computing power of miners worldwide – has dropped off the network since its market peak in May. SourceHere is the hashrate graph. It is currently 86.3TH/s, down from a peak of 180.7TH/s, so down 52.2% from the peak and trending strongly down. We may not have seen the end of the drop. This is good news for Bitcoin.The result is that the Bitcoin system slowed down:Typically, it takes about 10 minutes to complete a block, but Feinstein told CNBC the bitcoin network has slowed down to 14- to 19-minute block times. And thus, as shown in the difficulty graph, the Bitcoin algorithm adjusted the difficulty: This is precisely why bitcoin re-calibrates every 2016 blocks, or about every two weeks, resetting how tough it is for miners to mine. On Saturday, the bitcoin code automatically made it about 28% less difficult to mine – a historically unprecedented drop for the network – thereby restoring block times back to the optimal 10-minute window. SourceIt went from a peak of 25.046t to 19.933t, a drop of 20.4%. This is good news for Bitcoin, as Sigalos writes:Fewer competitors and less difficulty means that any miner with a machine plugged in is going to see a significant increase in profitability and more predictable revenue."All bitcoin miners share in the same economics and are mining on the same network, so miners both public and private will see the uplift in revenue," said Kevin Zhang, former Chief Mining Officer at Greenridge Generation, the first major U.S. power plant to begin mining behind-the-meter at a large scale.Assuming fixed power costs, Zhang estimates revenues of $29 per day for those using the latest-generation Bitmain miner, versus $22 per day prior to the change. Longer-term, although miner income can fluctuate with the price of the coin, Zhang also noted that mining revenues have dropped only 17% from the bitcoin price peak in April, whereas the coin's price has dropped about 50%. SourceHere is the miners' revenue graph. It went from a peak of $80.172M/day on April 15th to a trough of $13.065M/day on June 26th, a drop of 83.7%. It has since bounced back a little, so this is good news for Bitcoin, if not quite as good as Zhang thinks. Obviously, the trough was before the decrease in difficulty, which subsequently resulted in 6.25BTC rewards happening more frequently than before and thus increased miners' revenue somewhat.Have you noticed how important it is to check the numbers that the HODL-ers throw around?Matt Novak reported on June 21st that: Miners in China are now looking to sell their equipment overseas, and it appears many have already found buyers. CNBC’s Eunice Yoon tweeted early Monday that a Chinese logistics firm was shipping 6,600 lbs (3,000 kilograms) of crypto mining equipment to an unnamed buyer in Maryland for just $9.37 per kilogram. And Sigalos adds details:Of all the possible destinations for this equipment, the U.S. appears to be especially well-positioned to absorb this stray hashrate. CNBC is told that major U.S. mining operators are already signing deals to patriate some of these homeless Bitmain miners.U.S. bitcoin mining is booming, and has venture capital flowing to it, so they are poised to take advantage of the miner migration, Arvanaghi told CNBC."Many U.S. bitcoin miners that were funded when bitcoin's price started rising in November and December of 2020 means that they were already building out their power capacity when the China mining ban took hold," he said. "It's great timing." And, as always, the HODL-ers ignore economies of scale and hold out hope for the little guy:But Barbour believes that much smaller players in the residential U.S. also stand a chance at capturing these excess miners."I think this is a signal that in the future, bitcoin mining will be more distributed by necessity," said Barbour. "Less mega-mines like the 100+ megawatt ones we see in Texas and more small mines on small commercial and eventually residential spaces. It's much harder for a politician to shut down a mine in someone's garage." It is good news for Bitcoin that more of the mining power is in the US where the US government could suppress it by, for example, declaring that Mining Is Money Transmission and thus that pools needed to adhere to the AML/KYC rules. Doing so would place the poor little guy in a garage in a dilemma — mine on his own and be unlikely to get a reward before their rig was obsolete, or join an illegal pool and risk their traffic being spotted. Update:The Malaysian government's crackdown is an example to the world. Andrew Hayward reports that Police Destroy 1,069 Bitcoin Miners With Big Ass Steamroller In Malaysia. Ted Lawless: Automatically extracting keyphrases from text I've posted an explainer/guide to how we are automatically extracting keyphrases for Constellate, a new text analytics service from JSTOR and Portico. We are defining keyphrases as up to three word phrases that are key, or important, to the overall subject matter of the document Keyphrase is often used interchangeably with keywords, but we are opting to use the former since it's more descriptive We did a fair amount of reading to grasp prior art in this area, extracting keyphrases is a long standing research topic in information retrieval and natural language processing, and ended up developing a custom solution based on term frequency in the Constellate corpus If you are interested in this work generally, and not just the Constellate implementation, Burton DeWilde has published an excellent primer on automated keyphrase extraction. More information about Constellate can be found here. Disclaimer: this is a work-related post I don't intend to speak for my employer, Ithaka David Rosenthal: Venture Capital Isn't Working I was an early employee at three VC-funded startups from the 80s and 90s. All of them IPO-ed and two (Sun Microsystems and Nvidia) made it into the list of the top 100 US companies by market capitalization. So I'm in a good position to appreciate Jeffrey Funk's must-read The Crisis of Venture Capital: Fixing America’s Broken Start-Up System. Funk starts: Despite all the attention and investment that Silicon Valley’s recent start-ups have received, they have done little but lose money: Uber, Lyft, WeWork, Pinterest, and Snapchat have consistently failed to turn profits, with Uber’s cumulative losses exceeding $25 billion. Perhaps even more notorious are bankrupt and discredited start-ups such as Theranos, Luckin Coffee, and Wirecard, which were plagued with management failures, technical problems, or even outright fraud that auditors failed to notice.What’s going on? There is no immediately obvious reason why this generation of start-ups should be so financially disastrous. After all, Amazon incurred losses for many years, but eventually grew to become one of the most profitable companies in the world, even as Enron and WorldCom were mired in accounting scandals. So why can’t today’s start-ups also succeed? Are they exceptions, or part of a larger, more systemic problem? Below the fold, some reflections on Funk's insightful analysis of the "larger, more systemic problem".Funk introduces his argument thus:In this article, I first discuss the abundant evidence for low returns on VC investments in the contemporary market. Second, I summarize the performance of start-ups founded twenty to fifty years ago, in an era when most start-ups quickly became profitable, and the most successful ones rapidly achieved top-100 market capitalization. Third, I contrast these earlier, more successful start-ups with Silicon Valley’s current set of “unicorns,” the most successful of today’s start-ups. Fourth, I discuss why today’s start-ups are doing worse than those of previous generations and explore the reasons why technological innovation has slowed in recent years. Fifth, I offer some brief proposals about what can be done to fix our broken start-up system. Systemic problems will require systemic solutions, and thus major changes are needed not just on the part of venture capitalists but also in our universities and business schools. Is There A Problem?Funk's argument that there is a problem can be summarized thus:The returns on VC investments over the last two decades haven't matched the golden years of the proceeding two decades.In the golden years startups made profits.Now they don't.VC Returns Are Sub-ParSourceThis graph from a 2020 Morgan Stanley report shows that during the 90s the returns from VC investments greatly exceeded the returns from public equity. But since then the median VC return has been below that of public equity. This doesn't reward investors for the much higher risk of VC investments. The weighted average VC return is slightly above that of public equity because, as Funk explains:a small percentage of investments does provide high returns, and these high returns for top-performing VC funds persist over subsequent quarters. Although this data does not demonstrate that select VCs consistently earn solid profits over decades, it does suggest that these VCs are achieving good returns. It was always true that VC quality varied greatly. I discussed the advantages of working with great VCs in Kai Li's FAST Keynote:Work with the best VC funds. The difference between the best and the merely good in VCs is at least as big as the difference between the best and the merely good programmers. At nVIDIA we had two of the very best, Sutter Hill and Sequoia. The result is that, like Kai but unlike many entrepreneurs, we think VCs are enormously helpful. One thing that was striking about working with Sutter Hill was how many entrepreneurs did a series of companies with them, showing that both sides had positive experiences.Startups Used To Make ProfitsBefore the dot-com boom, there used to be a rule that in order to IPO a company, it had to be making profits. This was a good rule, since it provided at least some basis for setting the stock price at the IPO. Funk writes:There was a time when venture capital generated big returns for investors, employees, and customers alike, both because more start-ups were profitable at an earlier stage and because some start-ups achieved high market capitalization relatively quickly. Profits are an important indicator of economic and technological growth, because they signal that a company is providing more value to its customers than the costs it is incurring.A number of start-ups founded in the late twentieth century have had an enormous impact on the global economy, quickly reaching both profitability and top-100 market capitalization. Among these are the so-called FAANMG (Facebook, Amazon, Apple, Microsoft, Netflix, and Google), which represented more than 25 percent of the S&P’s total market capitalization and more than 80 percent of the 2020 increase in the S&P’s total value at one point—in other words, the most valuable and fastest-growing compa­nies in America in recent years. Funk's Table 2 shows the years to profitability and years to top-100 market capitalization for companies founded between 1975 and 2004. I'm a bit skeptical of the details because, for example, the table says it took Sun Microsystems 6 years to turn a profit. I'm pretty sure Sun was profitable at its 1986 IPO, 4 years from its founding.Note Funk's stress on achieving profitability quickly. An important Silicon Valley philosophy used to be:Success is great!Failure is OK.Not doing either is a big problem.The reason lies in the Silicon Valley mantra of "fail fast". Most startups fail, and the costs of those failures detract from the returns of the successes. Minimizing the cost of failure, and diverting the resource to trying something different, is important.Unicorns, Not So MuchWhat are these unicorns? Wikipedia tells us:In business, a unicorn is a privately held startup company valued at over $1 billion. The term was coined in 2013 by venture capitalist Aileen Lee, choosing the mythical animal to represent the statistical rarity of such successful ventures. Back in 2013 unicorns were indeed rare, but as Wikipedia goes on to point out:According to CB Insights, there are over 450 unicorns as of October 2020. Unicorns are breeding like rabbits, but the picture Funk paints is depressing:In the contemporary start-up economy, “unicorns” are purportedly “disrupting” almost every industry from transportation to real estate, with new business software, mobile apps, consumer hardware, internet services, biotech, and AI products and services. But the actual performance of these unicorns both before and after the VC exit stage contrasts sharply with the financial successes of the previous generation of start-ups, and suggests that they are dramatically overvalued.Figure 3 shows the profitability distribution of seventy-three unicorns and ex-unicorns that were founded after 2013 and have released net income and revenue figures for 2019 and/or 2020. In 2019, only six of the seventy-three unicorns included in figure 3 were profitable, while for 2020, seven of seventy were. Hey, they're startups, right? They just need time to become profitable. Funk debunks that idea too:Furthermore, there seems to be little reason to believe that these unprofitable unicorn start-ups will ever be able to grow out of their losses, as can be seen in the ratio of losses to revenues in 2019 versus the founding year. Aside from a tiny number of statistical outliers ... there seems to be little relationship between the time since a start-up’s founding and its ratio of losses to revenues. In other words, age is not correlated with profits for this cohort. Funk goes on to note that startup profitability once public has declined dramatically, and appears inversely related to IPO valuation:When compared with profitability data from decades past, recent start-ups look even worse than already noted. About 10 percent of the unicorn start-ups included in figure 3 were profitable, much lower than the 80 percent of start-ups founded in the 1980s that were profitable, according to Jay Ritter’s analysis, and also below the overall percentage for start-ups today (20 percent). Thus, not only has profitability dramatically dropped over the last forty years among those start-ups that went public, but today’s most valuable start-ups—those valued at $1 billion or more before IPO—are in fact less profitable than start-ups that did not reach such lofty pre-IPO valuations. Funk uses electric vehicles and biotech to illustrate startup over-valuation:For instance, driven by easy money and the rapid rise of Tesla’s stock, a group of electric vehicle and battery suppliers—Canoo, Fisker Automotive, Hyliion, Lordstown Motors, Nikola, and QuantumScape—were valued, combined, at more than $100 billion at their listing. Likewise, dozens of biotech firms have also achieved billions of dollars in market capitalizations at their listings. In total, 2020 set a new record for the number of companies going public with little to no revenue, easily eclipsing the height of the dot-com boom of telecom companies in 2000. The Alphaville team have been maintaining a spreadsheet of the EV bubble. They determined that there was no way these companies' valuations could be justified given the size of the potential market. Jamie Powell's April 12th Revisiting the EV bubble spreadsheet celebrates their assessment:At pixel time the losses from their respective peaks from all of the electric vehicle, battery and charging companies on our list total some $635bn of market capitalisation, or a fall of just under 38 per cent. Ouch. What Is Causing The ProblemThis all looks like too much money chasing too few viable startups, and too many me-too startups chasing too few total available market dollars.Funk starts his analysis of the causes of poor VC returns by pointing to the obvious one, one that applies to any successful investment strategy. Its returns will be eroded over time by the influx of too much money:There are many reasons for both the lower profitability of start-ups and the lower returns for VC funds since the mid to late 1990s. The most straightforward of these is simply diminishing returns: as the amount of VC investment in the start-up market has increased, a larger proportion of this funding has necessarily gone to weaker opportunities, and thus the average profitability of these investments has declined. But the effect of too much money is even more corrosive. I'm a big believer in Bill Joy's Law of Startups — "success is inversely proportional to the amount of money you have". Too much money allows hard decisions to be put off. Taking hard decisions promptly is key to "fail fast".Nvidia was an example of this. The company was founded in one of Silicon Valley's recurring downturns. We were the only hardware company funded in that quarter. We got to working silicon on a $2.5M A round. Think about it — each of our VCs invested $1.25M to start a company currently valued at $380,000M. Despite delivering ground-breaking performance, as I discussed in Hardware I/O Virtualization, that chip wasn't a success. But it did allow Jen-Hsun Huang to raise another $6.5M. He down-sized the company by 2/3 and got to working silicon of the highly successful second chip with, IIRC, six weeks' money left in the bank.Funk then discusses a second major reason for poor performance:A more plausible explanation for the relative lack of start-up successes in recent years is that new start-ups tend to be acquired by large incumbents such as the faamng companies before they have a chance to achieve top 100 market capitalization. For instance, YouTube was founded in 2004 and Instagram in 2010; some claim they would be valued at more than $150 billion each (pre-lockdown estimates) if they were independent companies, but instead they were acquired by Google and Facebook, respectively.18 In this sense, they are typical of the recent trend: many start-ups founded since 2000 were subsequently acquired by faamng, including new social media companies such as GitHub, LinkedIn, and WhatsApp. Likewise, a number of money-losing start-ups have been acquired in recent years, most notably DeepMind and Nest, which were bought by Google. But he fails to note the cause of the rash of acquisitions, which is clearly the total Lack Of Anti-Trust Enforcement in the US. As with too much money, the effects of this lack are more pernicious than at first appears. Again, Nvidia provides an example.Just like the founders and VCs of Sun, when we started Nvidia we knew that the route to an IPO and major return on investment involved years and several generations of product. So, despite the limited funding and with the full support of our VCs, we took several critical months right at the start to design an architecture for a family of successive chip generations based on Hardware I/O Virtualization. By ensuring that the drivers in application software interacted only with virtual I/O resources, the architecture decoupled the hardware and software release cycles. The strong linkage between them at Sun had been a consistent source of schedule slip.The architecture also structured the implementation of the chip as a set of modules communicating via an on-chip network. Each module was small enough that a three-person team could design, simulate and verify it. The restricted interface to the on-chip network meant that, if the modules verified correctly, it was highly likely that the assembled chip would verify correctly.Laying the foundations for a long-term product line in this way paid massive dividends. After the second chip, Nvidia was able to deliver a new chip generation every 6 months like clockwork. 6 months after we started Nvidia, we knew over 30 other startups addressing the same market. Only one, ATI, survived the competition with Nvidia's 6-month product cycle.VCs now would be hard to persuade that the return on the initial time and money to build a company that could IPO years later would be worth it when compared to lashing together a prototype and using it to sell the company to one of the FAANMGs. In many cases, simply recruiting a team that could credibly promise to build the prototype would be enough for an "aqui-hire", where a FAANMG buys a startup not for the product but for the people. Building the foundation for a company that can IPO and make it into the top-100 market cap list is no longer worth the candle.But Funk argues that the major cause of lower returns is this:Overall, the most significant problem for today’s start-ups is that there have been few if any new technologies to exploit. The internet, which was a breakthrough technology thirty years ago, has matured. As a result, many of today’s start-up unicorns are comparatively low-tech, even with the advent of the smartphone—perhaps the biggest technological breakthrough of the twenty-first century—fourteen years ago. Ridesharing and food delivery use the same vehicles, drivers, and roads as previous taxi and delivery services; the only major change is the replacement of dispatchers with smartphones. Online sales of juicers, furniture, mattresses, and exercise bikes may have been revolutionary twenty years ago, but they are sold in the same way that Amazon currently sells almost everything. New business software operates from the cloud rather than onsite computers, but pre-2000 start-ups such as Amazon, Google, and Oracle were already pursuing cloud computing before most of the unicorns were founded. Remember, Sun's slogan in the mid 80s was "The network is the computer"!Virtua Fighter on NV1In essence, Funk argues that succssful startups out-perform by being quicker than legacy companies to exploit the productivity gains made possible by a technological discontinuity. Nvidia was an example of this, too. The technological discontinuity was the transition of the PC from the ISA to the PCI bus. It wasn't possible to do 3D games over the ISA bus, it lacked the necessary bandwidth. The increased bandwidth of the first version of the PCI bus made it just barely possible, as Nvidia's first chip demonstrated by running Sega arcade games at full frame rate. The advantages startups have against incumbents include:An experienced, high-quality team. Initial teams at startups are usually recruited from colleagues, so they are used to working together and know each other's strengths and weaknesses. Jen-Hsun Huang was well-known at Sun, having been the application engineer for LSI Logic on Sun's first SPARC implementation. The rest of the initial team at Nvidia had all worked together building graphics chips at Sun. As the company grows it can no longer recruit only colleagues, so usually experiences what at Sun was called the "bozo invasion".Freedom from backwards compatibility constraints. Radical design change is usually needed to take advantage of a technological discontinuity. Reconciling this with backwards compatibility takes time and forces compromise. Nvidia was able to ignore the legacy of program I/O from the ISA bus and fully exploit the Direct Memory Access capability of the PCI bus from the start.No cash cow to defend. The IBM-funded Andrew project at CMU was intended to deploy what became the IBM PC/RT, which used the ROMP, an IBM RISC CPU competing with Sun's SPARC. The ROMP was so fast that IBM's other product lines saw it as a threat, and insisted that it be priced not to under-cut their existing product's price/performance. So when it finally launched, its price/performance was much worse than Sun's SPARC-based products, and it failed.Funk concludes this section:In short, today’s start-ups have targeted low-tech, highly regulated industries with a business strategy that is ultimately self-defeating: raising capital to subsidize rapid growth and securing a competitive position in the market by undercharging consumers. This strategy has locked start-ups into early designs and customer pools and prevented the experimentation that is vital to all start-ups, including today’s unicorns. Uber, Lyft, DoorDash, and GrubHub are just a few of the well-known start-ups that have pursued this strategy, one that is used by almost every start-up today, partly in response to the demands of VC investors. It is also highly likely that without the steady influx of capital that subsidizes below-market prices, demand for these start-ups’ services would plummet, and thus their chances of profitability would fall even further. In retrospect, it would have been better if start-ups had taken more time to find good, high-tech business opportunities, had worked with regulators to define appropriate behavior, and had experimented with various technologies, designs, and markets, making a profit along the way. But, if the key to startup success is exploiting a technological discontinuity, and there haven't been any to exploit, as Funk argues earlier, taking more time to "find good, high-tech business opportunities" wouldn't have helped. They weren't there to be found.How To Fix The Problem?Funk quotes Charles Duhigg skewering the out-dated view of VCs:For decades, venture capitalists have succeeded in defining themselves as judicious meritocrats who direct money to those who will use it best. But examples like WeWork make it harder to believe that V.C.s help balance greedy impulses with enlightened innovation. Rather, V.C.s seem to embody the cynical shape of modern capitalism, which too often rewards crafty middlemen and bombastic charlatans rather than hardworking employees and creative businesspeople. And:Venture capitalists have shown themselves to be far less capable of commercializing breakthrough technologies than they once were. Instead, as recently outlined in the New Yorker, they often seem to be superficial trend-chasers, all going after the same ideas and often the same entrepreneurs. One managing partner at SoftBank summarized the problem faced by VC firms in a marketplace full of copycat start-ups: “Once Uber is founded, within a year you suddenly have three hundred copycats. The only way to protect your company is to get big fast by investing hundreds of millions.” VCs like these cannot create the technological discontinuities that are the key to adequate returns on investment in startups:we need venture capitalists and start-ups to create new products and new businesses that have higher productivity than do existing firms; the increased revenue that follows will then enable these start-ups to pay higher wages. The large productivity advantages needed can only be achieved by developing breakthrough technologies, like the integrated circuits, lasers, magnetic storage, and fiber optics of previous eras. And different players—VCs, start-ups, incumbents, universities—will need to play different roles in each in­dustry. Unfortunately, none of these players is currently doing the jobs required for our start-up economy to function properly. Business SchoolsSuccess in exploiting a technological discontinuity requires understanding of, and experience with, the technology, its advantages and its limitations. But Funk points out that business schools, not being engineering schools, need to devalue this requirement. Instead, they focus on "entrepreneurship":In recent decades, business schools have dramatically increased the number of entrepreneurship programs—from about sixteen in 1970 to more than two thousand in 2014—and have often marketed these programs with vacuous hype about “entrepreneurship” and “technology.” A recent Stanford research paper argues that such hype about entrepreneurship has encouraged students to become entrepreneurs for the wrong reasons and without proper preparation, with universities often presenting entrepreneurship as a fun and cool lifestyle that will enable them to meet new people and do interesting things, while ignoring the reality of hard and demanding work necessary for success. One of my abiding memories of Nvidia is Tench Coxe, our partner at Sutter Hill, perched on a stool in the lab playing the "Road Rash" video game about 2am one morning as we tried to figure out why our first silicon wasn't working. He was keeping an eye on his investment, and providing a much-needed calming influence.Focus on entrepreneurship means focus on the startup's business model not on its technology:A big mistake business schools make is their unwavering focus on business model over technology, thus deflecting any probing questions students and managers might have about what role technological breakthroughs play and why so few are being commercialized. For business schools, the heart of a business model is its ability to capture value, not the more important ability to create value. This prioritization of value capture is tied to an almost exclusive focus on revenue: whether revenues come from product sales, advertising, subscriptions, or referrals, and how to obtain these revenues from multiple customers on platforms. Value creation, however, is dependent on technological improvement, and the largest creation of value comes from breakthrough technologies such as the automobile, microprocessor, personal computer, and internet commerce. The key to "capturing value" is extracting value via monopoly rents. The way to get monopoly rents is to subsidize customer acquisition and buy up competitors, until the customers have no place to go. This doesn't create any value. In fact once the monopolist has burnt through the investor's money they find they need a return that can only be obtained by raising prices and holding the customer to ransom, destroying value for everyone.It is true a startup that combines innovation in technology with innovation in business has an advantage. Once more, Nvidia provides an example. Before starting Nvidia, Jen-Hsun Huang had run a division of LSI Logic that traded access to LSI Logic's fab for equity in the chips it made. Based on this experience on the supplier side of the fabless semiconductor business, one of his goals for Nvidia was to re-structure the relationship between the fabless company and the fab to be more of a win-win. Nvidia ended up as one of the most successful fabless companies of all time. But note that the innovation didn't affect Nvidia's basic business model — contract with fabs to build GPUs, and sell them to PC and graphics board companies. A business innovation combined with technological innovation stands a chance of creating a big company; a business innovation with no technology counterpart is unlikely to.ResearchFunk assigns much blame for the lack of breakthrough technologies to Universities:University engineering and science programs are also failing us, because they are not creating the breakthrough technologies that America and its start-ups need. Although some breakthrough technologies are assembled from existing components and thus are more the responsibility of private companies—for instance, the iPhone—universities must take responsibility for science-based technologies that depend on basic research, technologies that were once more common than they are now. Note that Funk accepts as a fait accompli the demise of corporate research labs, which certainly used to do the basic research that led not just to Funk's examples of "semiconductors, lasers, LEDs, glass fiber, and fiber optics", but also, for example, to packet switching, and operating systems such as Unix. As I did three years ago in Falling Research Productivity, he points out that increased government and corporate funding of University research has resulted in decreased output of breakthrough technologies:Many scientists point to the nature of the contemporary university research system, which began to emerge over half a century ago, as the problem. They argue that the major breakthroughs of the early and mid-twentieth century, such as the discovery of the DNA double helix, are no longer possible in today’s bureaucratic, grant-writing, administration-burdened university. ... Scientific merit is measured by citation counts and not by ideas or by the products and services that come from those ideas. Thus, labs must push papers through their research factories to secure funding, and issues of scientific curiosity, downstream products and services, and beneficial contributions to society are lost. Funk's analysis of the problem is insightful, but I see his ideas for fixing University research as simplistic and impractical:A first step toward fixing our sclerotic university research system is to change the way we do basic and applied research in order to place more emphasis on projects that may be riskier but also have the potential for greater breakthroughs. We can change the way proposals are reviewed and evaluated. We can provide incentives to universities that will encourage them to found more companies or to do more work with companies. Funk clearly doesn't understand how much University research is already funded by companies, and how long attempts to change the reward system in Universities have been crashing into the rock comprised of senior faculty who achieved their position through the existing system.He is more enthusiastic but equally misled about how basic research in corporate labs could be revived:One option is to recreate the system that existed prior to the 1970s, when most basic research was done by companies rather than universities. This was the system that gave us transistors, lasers, LEDs, magnetic storage, nuclear power, radar, jet engines, and polymers during the 1940s and 1950s. ... Unlike their predecessors at Bell Labs, IBM, GE, Motorola, DuPont, and Monsanto seventy years ago, top university scientists are more administrators than scientists now—one of the greatest mis­uses of talent the world has ever seen. Corporate labs have smaller administrative workloads because funding and promotion depend on informal discussions among scientists and not extensive paperwork. Not understanding the underlying causes of the demise of corporate research labs, Funk reaches for the time-worm nostrums of right-wing economists, "tax credits and matching grants":We can return basic research to corporate labs by providing much stronger incentives for companies—or cooperative alliances of companies—to do basic research. A scheme of substantial tax credits and matching grants, for instance, would incentivize corporations to do more research and would bypass the bureaucracy-laden federal grant process. This would push the management of detailed technological choices onto scientists and engineers, and promote the kind of informal discussions that used to drive decisions about technological research in the heyday of the early twentieth century. The challenge will be to ensure these matching funds and tax credits are in fact used for basic research and not for product development. Requiring multiple companies to share research facilities might be one way to avoid this danger, but more research on this issue is needed. In last year's The Death Of Corporate Research Labs I discussed a really important paper from a year earlier by Arora et al, The changing structure of American innovation: Some cautionary remarks for economic growth, which Funk does not cite. I wrote:Arora et al point out that the rise and fall of the labs coincided with the rise and fall of anti-trust enforcement:Historically, many large labs were set up partly because antitrust pressures constrained large firms’ ability to grow through mergers and acquisitions. In the 1930s, if a leading firm wanted to grow, it needed to develop new markets. With growth through mergers and acquisitions constrained by anti-trust pressures, and with little on offer from universities and independent inventors, it often had no choice but to invest in internal R&D. The more relaxed antitrust environment in the 1980s, however, changed this status quo. Growth through acquisitions became a more viable alternative to internal research, and hence the need to invest in internal research was reduced. Lack of anti-trust enforcement, pervasive short-termism, driven by Wall Street's focus on quarterly results, and management's focus on manipulating the stock price to maximize the value of their options killed the labs:Large corporate labs, however, are unlikely to regain the importance they once enjoyed. Research in corporations is difficult to manage profitably. Research projects have long horizons and few intermediate milestones that are meaningful to non-experts. As a result, research inside companies can only survive if insulated from the short-term performance requirements of business divisions. However, insulating research from business also has perils. Managers, haunted by the spectre of Xerox PARC and DuPont’s “Purity Hall”, fear creating research organizations disconnected from the main business of the company. Walking this tightrope has been extremely difficult. Greater product market competition, shorter technology life cycles, and more demanding investors have added to this challenge. Companies have increasingly concluded that they can do better by sourcing knowledge from outside, rather than betting on making game-changing discoveries in-house. It is pretty clear that "tax credits and matching grants" aren't the fix for the fundamental anti-trust problem. Not to mention that the idea of "Requiring multiple companies to share research facilities" in and of itself raises serious ant-trust concerns. After such a good analysis, it is disappointing that Funk's recommendations are so feeble.We have to add inadequate VC returns and a lack of startups capable of building top-100 companies to the long list of problems that only a major overhaul of anti-trust enforcement can fix. Lina Khan's nomination to the FTC is a hopeful sign that the Biden adminstration understands the urgency of changing direction, but Biden's hesitation about nominating the DOJ's anti-trust chief is not.Update: Michael Cembalest's Food Fight: An update on private equity performance vs public equity markets has a lot of fascinating information about private equity in general and venture capital in particular. His graphs comparing MOIC (Multiple Of Invested Capital) and IRR (Internal Rate of Return) across vintage years support his argument that: We have performance data for venture capital starting in the mid-1990s, but the period is so distorted by the late 1990’s boom and bust that we start our VC performance discussion in 20045. In my view, the massive gains earned by VC managers in the mid-1990s are not relevant to a discussion of VC investing today. As with buyout managers, VC manager MOIC and IRR also tracked each other until 2012 after which a combination of subscription lines and faster distributions led to rising IRRs despite falling MOICs. There’s a larger gap between average and median manager results than in buyout, indicating that there are a few VC managers with much higher returns and/or larger funds that pull up the average relative to the median. The gap is pretty big: VC managers have consistently outperformed public equity markets when looking at the “average” manager. But to reiterate, the gap between average and median results are substantial and indicate outsized returns posted by a small number of VC managers. For vintage years 2004 to 2008, the median VC manager actually underperformed the S&P 500 pretty substantially. Another of Cembalest's fascinating graphs addresses this question: One of the other “food fight” debates relates to pricing of venture-backed companies that go public. In other words, do venture investors reap the majority of the benefits, leaving public market equity investors “holding the bag”? Actually, the reverse has been true over the last decade when measured in terms of total dollars of value creation accruing to pre- and post-IPO investors: post-IPO investor gains have often been substantial. To show this: We analyzed all US tech, internet retailing and interactive media IPOs from 2010 to 2019. We computed the total value created since each company’s founding, from original paid-in capital by VCs to its latest market capitalization. We then examined how total value creation has accrued to pre- and post-IPO investors6. Sometimes both investor types share the gains, and sometimes one type accrues the vast majority of the gains. Pre-IPO investors earn the majority of the pie when IPOs collapse or flat-line after being issued, and post-IPO investors reap the majority of the pie when IPOs appreciate substantially after being issued. There are three general regions in the chart. As you can see, the vast majority of the 165 IPOs analyzed resulted in a large share of the total value creation accruing to public market equity investors; nevertheless, there were some painful exceptions (see lower left region on the chart). David Rosenthal: A Modest Proposal About Ransomware On the evening of July 2nd the REvil ransomware gang exploited a 0-day vulnerability to launch a supply chain attack on customers of Kaseya's Virtual System Administrator (VSA) product. The timing was perfect, with most system administrators off for the July 4th long weekend. By the 6th Alex Marquardt reported that Kaseya says up to 1,500 businesses compromised in massive ransomware attack. REvil, which had previously extorted $11M from meat giant JBS, announced that for the low, low price of only $70M they would provide everyone with a decryptor.The US government's pathetic response is to tell the intelligence agencies to investigate and to beg Putin to crack down on the ransomware gangs. Good luck with that! It isn't his problem, because the gangs write their software to avoid encrypting systems that have default languages from the former USSR.I've writtten before (here, here, here) about the importance of disrupting the cryptocurrency payment channel that enables ransomware, but it looks like the ransomware crisis has to get a great deal worse before effective action is taken. Below the fold I lay out a modest proposal that could motivate actions that would greatly reduce the risk.It turns out that the vulnerability that enabled the REvil attack didn't meet the strict definition of a 0-day. Gareth Corfield's White hats reported key Kaseya VSA flaw months ago. Ransomware outran the patch explains: Rewind to April, and the Dutch Institute for Vulnerability Disclosure (DIVD) had privately reported seven security bugs in VSA to Kaseya. Four were fixed and patches released in April and May. Three were due to be fixed in an upcoming release, version 9.5.7. Unfortunately, one of those unpatched bugs – CVE-2021-30116, a credential-leaking logic flaw discovered by DIVD's Wietse Boonstra – was exploited by the ransomware slingers before its fix could be emitted. DIVD praised Kaseya's response:Once Kaseya was aware of our reported vulnerabilities, we have been in constant contact and cooperation with them. When items in our report were unclear, they asked the right questions. Also, partial patches were shared with us to validate their effectiveness.During the entire process, Kaseya has shown that they were willing to put in the maximum effort and initiative into this case both to get this issue fixed and their customers patched. They showed a genuine commitment to do the right thing. Unfortunately, we were beaten by REvil in the final sprint, as they could exploit the vulnerabilities before customers could even patch. But if Kaseya's response to DIVD's disclosure was praisworthy, it turns out it was the exception. In Kaseya was warned about security flaws years ahead of ransomware attack by J., Fingas reports that: The giant ransomware attack against Kaseya might have been entirely avoidable. Former staff talking to Bloomberg claim they warned executives of "critical" security flaws in Kaseya's products several times between 2017 and 2020, but that the company didn't truly address them. Multiple staff either quit or said they were fired over inaction.Employees reportedly complained that Kaseya was using old code, implemented poor encryption and even failed to routinely patch software. The company's Virtual System Administrator (VSA), the remote maintenance tool that fell prey to ransomware, was supposedly rife with enough problems that workers wanted the software replaced.One employee claimed he was fired two weeks after sending executives a 40-page briefing on security problems. Others simply left in frustration with a seeming focus on new features and releases instead of fixing basic issues. Kaseya also laid off some employees in 2018 in favor of outsourcing work to Belarus, which some staff considered a security risk given local leaders' partnerships with the Russian government. ... The company's software was reportedly used to launch ransomware at least twice between 2018 and 2019, and it didn't significantly rethink its security strategy. To reiterate:The July 2nd attack was apparently at least the third time Kaseya had infected customers with ransomware!Kaseya outsourced development to Belarus, a country where ransomware gangs have immunity!.Kaseya fired security whistleblowers!The first two incidents didn't seem to make either Kaseya or its customers re-think what they were doing. Clearly, the only reason Kaseya responded to DIVD's warning was the threat of public disclosure.Without effective action to change this attitude the ransomware crisis will definitely result in what Stephen Diehl calls The Oncoming Ransomware Storm: Imagine a hundred new Stuxnet-level exploits every day, for every piece of a equipment in public works and health care. Where every day your check your phone for the level of ransomware in the wild just like you do the weather. Entire cities randomly have their metro systems, water, power grids and internet shut off and on like a sudden onset of bad cybersecurity “weather”.Or a time in business in which every company simply just allocates a portion of its earnings upfront every quarter and pre-pays off large ransomware groups in advance. It’s just a universal cost of doing business and one that is fully sanctioned by the government because we’ve all just given up trying to prevent it and it’s more efficient just to pay the protection racket. To make things worse, companies can insure against the risk of ransomware, essentially paying to avoid the hassle of maintaining security. Insurance companies can't price these policies properly, because they can't do enough underwriting to know, for example, whether the customer's backups actually work and whether they are offline enough so the ransomware doesn't encrypt them too.In Cyber insurance model is broken, consider banning ransomware payments, says think tank Gareth Corfield reports on the Royal United Services Institute's (RUSI) latest report, Cyber Insurance and the Cyber Security Challenge:Unfortunately, RUSI's researchers found that insurers tend to sell cyber policies with minimal due diligence – and when the claims start rolling in, insurance company managers start looking at ways to escape an unprofitable line of business....RUSI's position on buying off criminals is unequivocal, with [Jason] Nurse and co-authors Jamie MacColl and James Sullivan saying in their report that the UK's National Security Secretariat "should conduct an urgent policy review into the feasibility and suitability of banning ransom payments." The fundamental problem is that neither the software vendors nor the insurers nor their customers are taking security seriously enough because it isn't a big enough crisis yet. The solution? Take control of the crisis and make it big enough that security gets taken seriously.The US always claims to have the best cyber-warfare capability on the planet, so presumably they could do ransomware better and faster than gangs like REvil. The US should use this capability to mount ransomware attacks against US companies as fast as they can. Victims would see, instead of a screen demanding a ransom in Monero to decrypt their data, a screen saying:US Government CyberSecurity AgencyPatch the following vulnerabilities immediately!The CyberSecurity Agency (CSA) used some or all of the following vulnerabilities to compromise your systems and display this notice:CVE-2021-XXXXXCVE-2021-YYYYYCVE-2021-ZZZZZThree days from now if these vulnerabilities are still present, the CSA will encrypt your data. You will be able to obtain free decryption assistance from the CSA once you can prove that these vulnerabilities are no longer present. If the victim ignored the notice, three days later they would see:US Government CyberSecurity AgencyThe CyberSecurity Agency (CSA) used some or all of the following vulnerabilities to compromise your systems and encrypt your data:CVE-2021-XXXXXCVE-2021-YYYYYCVE-2021-ZZZZZOnce you have patched these vulnerabilities, click here to decrypt your dataThree days from now if these vulnerabilities are still present, the CSA will re-encrypt your data. For a fee you will be able to obtain decryption assistance from the CSA once you can prove that these vulnerabilities are no longer present. The program would start out fairly gentle and ramp up, shortening the grace period to increase the impact.The program would motivate users to keep their systems up-to-date with patches for disclosed vulnerabilities, which would not merely help with ransomware, but also with botnets, data breaches and other forms of malware. It would also raise the annoyance factor customers face when their supplier fails to provide adequate security in their products. This in turn would provide reputational and sales pressure on suppliers to both secure their supply chain and, unlike Kaseya, prioritize security in their product development.Of course, the program above only handles disclosed vulnerabilities, not the 0-days REvil used. There is an flourishing trade in 0-days, of which the NSA is believed to be a major buyer. The supply in these markets is increasing, as Dan Goodin reports in iOS zero-day let SolarWinds hackers compromise fully updated iPhones:In the first half of this year, Google’s Project Zero vulnerability research group has recorded 33 zero-day exploits used in attacks—11 more than the total number from 2020. The growth has several causes, including better detection by defenders and better software defenses that require multiple exploits to break through.The other big driver is the increased supply of zero-days from private companies selling exploits.“0-day capabilities used to be only the tools of select nation-states who had the technical expertise to find 0-day vulnerabilities, develop them into exploits, and then strategically operationalize their use,” the Google researchers wrote. “In the mid-to-late 2010s, more private companies have joined the marketplace selling these 0-day capabilities. No longer do groups need to have the technical expertise; now they just need resources.”The iOS vulnerability was one of four in-the-wild zero-days Google detailed on Wednesday....Based on their analysis, the researchers assess that three of the exploits were developed by the same commercial surveillance company, which sold them to two different government-backed actors. As has been true since the Cold-War era and the "Crypto Wars" of the 1980s when cryptography was considered a munition, the US has prioritized attack over defense. The NSA routinely hoards 0-days, preferring to use them to attack foreigners rather than disclose them to protect US citizens (and others). This short-sighted policy has led to several disasters, including the Juniper supply-chain compromise and NotPetya. Senators wrote to the head of the NSA, and the EFF sued the Director of National Intelligence, to obtain the NSA's policy around 0-days:Since these vulnerabilities potentially affect the security of users all over the world, the public has a strong interest in knowing how these agencies are weighing the risks and benefits of using zero days instead of disclosing them to vendors, It would be bad enough if the NSA and other nations' security services were the only buyers of 0-days. But the $11M REvil received from JBS buys a lot of them, and if each could net $70M they'd be a wonderful investment. Forcing ransomware gangs to use 0-days by getting systems up-to-date with patches is good, but the gangs will have 0-days to use. So although the program above should indirectly reduce the supply (and thus increase the price) of 0-days by motivating vendors to improve their development and supply chain practices, something needs to be done to reduce the impact of 0-days on ransomware.The Colonial Pipeline and JBS attacks, not to mention the multiple hospital chains that have been disrupted, show that it is just a matter of time before a ransomware attack has a major impact on US GDP (and incidentally on US citizens). In this light, the idea that NSA should stockpile 0-days for possible future use is counter-productive. At any time 0-days in the hoard might leak, or be independently discovered. In the past the fallout from this was limited, but no longer; they might be used for a major ransomware attack. Is the National Security Agency's mission to secure the United States, or to have fun playing Team America: World Police in cyberspace?Unless they are immediately required for a specific operation, the NSA should disclose 0-days it discovers or purchases to the software vendor, and once patched, add them to the kit it uses to run its "ransomware" program. To do less is to place the US economy at risk.PS: David Sanger reported Tuesday that Russia’s most aggressive ransomware group disappeared. It’s unclear who disabled them.: Just days after President Biden demanded that President Vladimir V. Putin of Russia shut down ransomware groups attacking American targets, the most aggressive of the groups suddenly went off-line early Tuesday. ... A third theory is that REvil decided that the heat was too intense, and took the sites down itself to avoid becoming caught in the crossfire between the American and Russian presidents. That is what another Russian-based group, DarkSide, did after the ransomware attack on Colonial Pipeline, ...But many experts think that DarkSide’s going-out-of-business move was nothing but digital theater, and that all of the group’s key ransomware talent will reassemble under a different name. This is by far the most likely explanation for REvil's disappearance, leaving victims unable to pay. The same day, Bogdan Botezatu and Radu Tudorica reported that Trickbot Activity Increases; new VNC Module On the Radar: The Trickbot group, which has infected millions of computers worldwide, has recently played an active role in disseminating ransomware. We have been reporting on notable developments in Trickbot’s lifecycle, with highlights including the analysis in 2020 of one of its modules used to bruteforce RDP connections and an analysis of its new C2 infrastructure in the wake of the massive crackdown in October 2020.Despite the takedown attempt, Trickbot is more active than ever. In May 2021, our systems started to pick up an updated version of the vncDll module that Trickbot uses against select high-profile targets. As regards the "massive crackdown", Ravie Lakshmanan notes: The botnet has since survived two takedown attempts by Microsoft and the U.S. Cyber Command, Update:SourceVia Barry Ritholtz we find this evidence of Willie Sutton's law in action. When asked "Why do you rob banks?", Sutton replied "Because that's where the money is."SourceAnd, thanks to Jack Cable, there's now ransomwhe.re, which tracks ransomware payments in real time. It suffers a bit from incomplete data. Because it depends upon tracking Bitcoin addresses, it will miss the increasing proportion of demands that insist on Monero. Hugh Rundle: Top 5 Big Library Ideas in History GLAM Blog Club has the first in a new approach to themes this month - Top five big.... I'm also taking a new approach, with some shitty drawings to spice things up. I hope you enjoy. 5 - Found in translation (Academy of Gondishapur, Persia) The Academy of Gondishapur (modern day Iran) was centred around a library of medical books from around the world known to Persia, and a school of talented translators to make it all readable in Persian. 4 - Oppression through "standards" (Library of Congress, United States of America) Most libraries in the Anglosphere use Library of Congress Subject Headings or variations based on them, for classifying knowledge. It's pretty ridiculous for a primary school library in Fiji to organise knowledge according to the worldview of American politicians, but it sure is effective soft power. 3 - Shelve it like you stole it (Library of Alexandria) The Ptolomies wanted their library to have a copy of every book in the known world. Woe betide anyone foolish enough to turn up with a boatload of books. 2 - Give communists somewhere to write (British Library, London) World champion grump and famous communist Karl Marx washed up in London after he was thrown out of everywhere else in Europe. The British Library famously became the place he wrote most of his world-changing book Capital. 1 - Read in the bath (Baths of Trajan, Rome) The enormous Baths of Trajan in ancient Rome were commissioned by Emperor Trajan and included several cold and hot baths, a swimming pool, sports stadium, and two libraries (Greek, and Latin). Lucidworks: Lucidworks Experts Weigh In on the Power of Connected Experiences Our team shared their thoughts on the five critical elements of creating a connected experience that delights customers, empowers employees, and enhances the experience for both. The post Lucidworks Experts Weigh In on the Power of Connected Experiences appeared first on Lucidworks. Digital Library Federation: Catching up with past NDSA Innovation Awards Winners: ePADD Nominations are now being accepted for the NDSA 2021 Excellence Awards. In 2017, ePADD won the NDSA Innovation Award in the Project category. At that time, this project was an undertaking to develop free and open-source computational analysis software that facilitates screening, browsing, and access for historically and culturally significant email collections. The software incorporates techniques from computer science and computational linguistics, including natural language processing, named entity recognition, and other statistical machine-learning associated processes. Glynn Edwards accepted the award on behalf of the project. She is currently the Assistant Director in the Department of Special Collections & University Archives at Stanford University and kindly took a few minutes to help us catch up on where the project stands today. What have you been doing since receiving an NDSA Innovation Award in 2017? We completed two additional phases of software development for ePADD. Phase 2, funded by the Institute of Museum & Library Studies (IMLS) National Leadership Grant (NLG) and Phase 3, funded by the Andrew W. Mellon Foundation. These rounds of development focused on adding new features and functionality to the software to support the appraisal, processing, discovery, and delivery of email collections of historic value. Our project team also changed before this third phase with Sally DeBauche, our new digital archivist, taking over project management full-time for 18 months.  Before Phase 3 launched in January 2020 with Harvard Library as our official partner, Jessica Smith, Ian Gifford, and Jochen Farwer from the University of Manchester contacted us about their own independent project to redevelop aspects of ePADD. They created a prototype version of ePADD that would display a full-text email archive in the Discovery Module, allowing users to view an email collection online.  Meetings with the Harvard team, represented by Tricia Patterson & Stephen Abrams, progressed to the proposal of ceasing the redevelopment of their in-house email processing and preservation software, EAS, and instead collaborating with us to add specific preservation functionality to ePADD. At this stage, we brought the team from the University of Manchester into those discussions to help us shape the requirements for a new version of ePADD with greater support for preservation workflows. Concurrently with our Phase 3 grant, our three institutions began working on a joint grant proposal for Phase 4 of ePADD’s software development, funded by the University of Illinois’s Email Archives: Building Capacity and Community (EA:BCC) re-grant program, supported by the Andrew W. Mellon Foundation. We have been meeting together for the past year as we document requirements, identify roles and responsibilities for each of our units to carry out this work. For this phase of the project, we have contracted with an independent software development team, Sartography, to implement changes to the software, while retaining ePADD’s original development team to ensure consistency in our approach. Internally at Stanford, we continue to use ePADD as our production tool for appraising, processing, and delivering email archives at Stanford. Our digital archives team, Sally DeBauche & Annie Schweikert, have presented on the software to our group of curators and have been in contact with them about appraising and processing new acquisitions. Annie & Sally have processed several new email collections, including the Ted Nelson email archive and the Don Knuth email archive. We have also launched a new multi-institutional online ePADD Discovery website at epadd.org, featuring the archive of literary critic and historical theorist, Hayden White, from the UC Santa Cruz archives. To accompany the site, we have created documentation about contributing to the Discovery site.  What did receiving the NDSA award mean to you? Beyond the recognition of our colleagues, it raised the profile of the ePADD software which garnered more users and interest. This greater following gave us the impetus for our third grant from the Mellon Foundation and allowed us to create a more stable program that can be used as a production tool for email archives. What efforts/advances/ideas of the last few years have you been impressed with or admired in the field of data stewardship and/or digital preservation? There has been a lot of development in the field since we started with the ePADD project. But I have been very impressed with the EaaSI project (emulation), for which Stanford serves as a node host. This project will be a game changer for our stakeholders across the university and beyond, as well as colleagues throughout the library who use this platform to provide access to legacy software and files that rely on unique and outdated software. How has the ePADD project evolved since you won the Innovation Award? I included a lot of this in #1 above – but I would add that the raised profile and increased interest and use of ePADD, has brought dedicated partners. The Stanford-Harvard-Manchester partnership began during our third grant and has increased exposure of ePADD+ (as we now refer to it) and with the greater involvement from colleagues at each institution has allowed the larger team to focus on different aspects of running and managing the project. One exciting outcome is the commitment of more software testers and greater input from a wider community. What do you currently see as some of the biggest challenges in email assessment and preservation? While I am still hoping for a more holistic way to search across all types of archival content, I think that sustainability is one of the major issues facing open-source software development projects. The cost of bug-fixes and updates with new versions of underlying programs might not always be inordinate, but securing dedicated funding is not simple and is often very time consuming. Even more difficult is getting concrete buy-in for funds needed to pay developers to create significant enhancements. We are excited to see the progress from the It Takes a Village in Practice project that aims to provide guidance to open-source software development projects on sustainability. We are engaged in beta testing for the tools that they are developing, and it will be very interesting to see how they can be of service to the broader community. The post Catching up with past NDSA Innovation Awards Winners: ePADD appeared first on DLF. Lucidworks: The Good, the Bad and the Ugly: A History of Customer Service Customer service has a wild history rooted in the boom of innovation that came with the Industrial Revolution. The ebbs and flows of the industry are closely tied to the economic crests and troughs of American consumerism as a whole. Join us on the journey. The post The Good, the Bad and the Ugly: A History of Customer Service appeared first on Lucidworks. Samvera: Calls for proposals – Samvera Connect 2021 Online The Program Committee is pleased to announce its Call for Proposals (CfP) of workshops, presentations/panels, lightning talks, and posters for Samvera Connect 2021 Online. The online conference workshops will be held October 14th -15th with plenary presentations October 18th – 22nd. Connect Online programming is intended to serve the needs of attendees from throughout our Community, from potential adopters to expert Samverans, in many roles including developers, managers, sysops, metadata librarians, and others who are interested in Samvera technologies and Community activities. Workshops: submission form open through Sunday, August 15th, 2021  The Workshops form includes the option to request a workshop on a specific topic that you would like to attend. The Program Committee will use these suggestions to solicit workshops from the Community. Presentations and Panels: submission form open through Tuesday, August 31st, 2021 Lightning Talks: submission form open through Thursday, September 30th, 2021 Virtual Posters: submission form open through Thursday, September 30th, 2021 You may find it helpful to refer to the workshop program, presentation/lightning talk program, and posters from last year’s online conference. The post Calls for proposals – Samvera Connect 2021 Online appeared first on Samvera. Casey Bisson: b2c2b is the new b2b The most recent StackOverflow developer survey shows 77% of developers prefer to use a free trial as a way to research a new service. Forrester Research reported that 93% of b2b buyers prefer self-service buying online. And a Harvard Business Review study found “that [b2b] customers are, on average, 57% of the way through the [purchase] process before they engage with supplier sales reps.” Because of this, b2b sales require internal advocates—called mobilizers—that can build consensus around purchase decisions. Lorcan Dempsey: Two metadata directions This short piece is based on a presentation I delivered by video to the Eurasian Academic Libraries Conference - 2021, organized by The Nazarbayev University Library and the Association of University Libraries in the Republic of Kazakhstan. Thanks to April Manabat of Nazarbayev University for the invitation and for encouragement as I prepared the presentation and blog entry. I was asked to talk about metadata and to mention OCLC developments. The conference topic was: Contemporary Trends in Information Organization in the Academic Library Environment.The growing role and value of metadataLibraries are very used to managing metadata for information resources - for books, images, journal articles and other resources. Metadata practice is rich and varied. We also work with geospatial data, archives, images, and many other specialist resources. Authority work has focused on people, places and things (subjects). Archivists are concerned about evidential integrity, context and provenance. And so on. In the network environment, general metadata requirements have continued to evolve in various ways: Information resource diversification. We want to discover, manage or otherwise interact with a progressively broader range of resources, research data, for example, or open educational resources. Resource type diversification. However, we are also increasingly interested in more resource types than informational alone. The network connects together many entities or types of resource in new ways, and interaction between these entities requires advance knowledge, often provided by metadata, to be efficient. Workflows tie people, applications and devices together to get things done. To be effective, each of these resources needs to be described. Social applications like Strava tie together people, activities, places, and so on. Scholarly engines like Google Scholar, Semantic Scholar, Scopus or Dimensions tie together research outputs, researchers, funders, and institutions. The advance knowledge required for these workflows and environments to work well is provided by metadata and so we are increasingly interested in metadata about a broad range of entities. Functional diversification. We want to discover, manage, request or buy resources. We also want to provide context about resources, ascertain their validity or integrity over time, determine their provenance. We want to compare resources, collect data about usage, track and measure. We want to make connections between entities, understand relationships, and actually create new knowledge. We do not just want to find individual information resources or people, we want to make sense of networks of people, resources, institutions, and so on and the relations between them. Source diversification.  I have spoken about four sources of metadata in the past. Versions of these are becoming more important, but so is how they are used together to tackle the growing demands on metadata in digital environments. Professional. Our primary model of metadata has been a professional one, where librarians, abstract writers, archivists and so on are the primary source. Libraries have streamlined metadata creation and provision for acquired resources. Many libraries, archives and others, devote professional attention and expertise to unique resources - special collections, archives, digitised and born-digital materials, institutional research outputs, faculty profiles, and so on. Community.  I described the second as crowdsourced, and certainly the collection of contributions in this way has been of importance, in digital projects, community initiatives and in other places. However, one might extend this to a broader community source. The subject of the description or the communities from which the resources originate are an increasingly important source. This is especially the case as we pluralize description as I discuss further below.  An interesting example here is Local Contexts which works with collecting institutions and Indigenous communities and "provides a new set of procedural workflows that emphasize vetting content, collaborative curation, ethical management and sustained outreach practices within institutions."Programmatically promoted. The programmatic promotion of metadata is becoming increasingly important. We will see more algorithmically generated metadata, as natural language processing, entity recognition, machine learning, image recognition, and other approaches become more common. This is especially the case as we move towards more entity-based approaches where the algorithmic identification of entities and relationships across various resources becomes important. At the same time, we are more aware of the need for responsible operations, where dominant perspectives also influence construction of algorithms and learning sets. Intentional. A fourth source is intentional data, or usage data, data about how resources are downloaded, cited, linked and so on. This may be used to rate and rank, or to refine other descriptions. Again, appropriate use of this data needs to be responsibly managed. Perspective diversification. A purported feature of much professional metadata activity has been neutral or objective description. However, we are aware that to be &aposneutral&apos can actually mean to be aligned with dominant perspectives. We know that metadata and subject description have very often been partial, harmful or unknowing about the resources described, or have continued obsolescent or superseded perspectives, or have not described resources or people in ways that a relevant community expects or can easily find. This may be in relation to race, gender, nationality, sexual orientation, or other contexts. This awareness leads directly into the second direction I discuss below, pluralization. It also highlights the increasing reliance on community perspectives. It is important to understand context, cultural protocols, community meanings and expectations, through more reciprocal approaches. And as noted above, use of programmatically promoted or intentional data needs to be responsibly approached, alert to ways in which preferences or bias can be present.So we want to make metadata work harder, but we also need more metadata and more types of metadata. Metadata helps us to work in network environments. A more formal definition of metadata might run something like: "schematized assertions about a resource of interest." However, as we think about navigating increasingly digital workflows and environments, I like to think about metadata in this general way:data which relieves a potential user (whether human or machine) of having to have full advance knowledge of the existence or characteristics of a resource of potential interest in the environment.Metadata allows applications and users to act more intelligently, and this becomes more important in our increasingly involved digital workflows and environments. Given this importance, and given the importance of such digital environments to our working, learning and social lives, it also becomes more important to think about how metadata is created, who controls it, how it is used, and how it is stewarded over time. Metadata is about both value and values. In this short piece, and in the presentation on which it is based, I limit my attention to two important directions. Certainly, this is a part only of the larger picture of evolving metadata creation, use and design in libraries and beyond. Two metadata directionsI want to talk about two important directions here. Entification. Pluralization.Entification: strings and thingsGoogle popularized the notion of moving from &aposstrings&apos to &aposthings&apos when it introduced the Google knowledge graph. By this we mean that it is difficult to rely on string matching for effective search, management or measurement of resources. Strings are ambiguous. What we are actually interested in are the &aposthings&apos themselves, actual entities which may be referred to in different ways. Entification involves establishing a singular identity for &aposthings&apos so that they can be operationalized in applications, gathering information about those &aposthings,&apos and relating those &aposthings&apos to other entities of interest. Research information management and the scholarly ecosystem provide a good example of this. This image shows a variety of entities of interest in a research information management system. These include researchers, research outputs, institutions, grants and other entities. Relationships between these include affiliation (researcher to institution), collaborator (researcher to researcher), authorship (researcher to research output), and so on. We want to know that the professor who got this grant is the same one as the one who teaches this course or published that paper. These identities could be (and are) established within each system or service. So, this means that a Research Information Management System does not only return strings that match a text search. It can facilitate the prospecting of a body of institutional research outputs, a set of scholars, a collection of labs and departments, and so on, and allow links to related works, scholars and institutions to be made. Similarly, a scholarly engine like Scopus, or Semantic Scholar, or Dimensions will bring together scholarly entities and offer access to a more or less rich network of results. Of course, in this case, the metadata may be under less close control. Typically, they will also be using metadata sourced in the four ways I described above, as &aposprofessionally created&apos metadata works with metadata contributed by, say, individual researchers, as entities may be established and related programmatically, and as usage data helps organize resources. Wikidata is an important resource in this context, as a globally addressable identity base for entities of all types. Of course one wants to work with entities across systems and services. Researchers move between institutions, collaborate with others, cite research outputs, and so on. One may need to determine whether an identity in one context refers to the same entity as an identity in another context. So important entity backbone services have emerged which allow entities to be identified across systems and services by assigning globally unique identifiers. These include Orcid for researchers, DOI for research outputs, and the emerging ROR for institutions. These initiatives aim to create a singular identity for resources, gather some metadata about them, and make this available for other services to use. So a researcher may have an Orcid, for example, which is associated with a list of research outputs, an affiliation, and so on. This Orcid identity may then be used across services, supporting global identification and contextualization. Here, for example, is a profile generated programmatically by Dimensions (from Digital Science) for my colleague Lynn Connaway. It aims to generate a description, but then also to recognize and link to other entities (e.g. topics, institutions, or collaborators). Again, the goal is to present a profile, and then to allow us to prospect a body of work and its connections. It pulls data from various places. We are accustomed to this more generally now with Knowledge Cards in Google (underpinned by Google&aposs knowledge graph). Of course, this is not complete, and there has been some interesting discussion about improved use of identifiers from the scholarly entity backbone services. Meadows and Jones talk about the practical advantages to scholarly communication of a &aposPID-optimized world.&apos (PID=persistent identifier.)In these systems and services, the entities I have been talking about will typically be nodes in an underlying knowledge graph or ontology.  Today, KGs are used extensively in anything from search engines and chatbots to product recommenders and autonomous systems. In data science, common use cases are around adding identifiers and descriptions to data of various modalities to enable sense-making, integration, and explainable analysis.  [...]A knowledge graph organises and integrates data according to an ontology, which is called the schema of the knowledge graph, and applies a reasoner to derive new knowledge. Knowledge graphs can be created from scratch, e.g., by domain experts, learned from unstructured or semi-structured data sources, or assembled from existing knowledge graphs, typically aided by various semi-automatic or automated data validation and integration mechanisms. // The Alan Turing InstituteThe knowledge graph may be internal to a particular service (Google for example) or may be used within a domain or globally. Again, Wikidata is important because it publishes its underlying knowledge graph and can be used to provide context, matches, and so on for other resources. The Library of Congress, other national libraries, OCLC ,and others now manage entity backbones for the library community, sometimes rooted in traditional authority files. There is a growing awareness of the usefulness of entity-based approaches and of the importance of identifiers in this context. In this way, it is expected that applications will be able to work across these services. For example, an Orcid identity may be matched with identifiers from other services, VIAF for example, to provide additional context or detail, or Wikidata to provide demographic or other data not typically found in bibliographic services. In this way, we can expect to see a decentralized infrastructure which can be tied together to achieve particular goals. Pluralizing description“Nobody should be compelled to us a slur to search a catalogue” - @mentionthewar #DCDC21— David Prosser (@RLUK_David) June 29, 2021 Systems of description are inevitably both explicitly and implicitly constructed within particular perspectives. Metadata and subject description have long been criticized for embodying dominant perspectives, and for actively shunning or overlooking the experiences, memories or expectations of parts of the communities they serve. They may also contain superseded, obsolescent or harmful descriptions. Libraries have spoken about "knowledge organization" but such a phrase has to reckon with two challenges. First, it is acknowledged that there are different  knowledges. The TK Labels support the inclusion of local protocols for access and use to cultural heritage that is digitally circulating outside community contexts. The TK Labels identify and clarify community-specific rules and responsibilities regarding access and future use of traditional knowledge. This includes sacred and/or ceremonial material, material that has gender restrictions, seasonal conditions of use and/or materials specifically designed for outreach purposes. // Local Contexts, [Traditional Knowledge] labelsSecond, knowledge may be contested, where it has been constructed within particular power relations and dominant perspectives. Others described “the look of horror” on the face of someone who has been told to search using the term “Indians of North America.” Students — Indigenous as well as settler—who work with the collections point out to staff the many incorrect or outdated terms they encounter. // Towards respectful and inclusive descriptionMy Research Library Partnership colleagues carried out a survey on Equity, Diversity and Inclusion in 2017. While it is interesting to note the mention of archival description as among the most changed features, there was clearly a sense that metadata description and terminologies required attention, and there was an intention to address these next.Such work may be retrospective, including remediation of existing description by linking to or substituting more appropriate descriptions. And there is certainly now a strong prospective focus, working on pluralizing description, on decentering dominant perspectives to respectfully and appropriately describe resources. This work has been pronounced in Australia, New Zealand and Canada, countries which recognize the need to address harmful practices in relation to Indigenous populations. In early 2020, my RLP colleagues interviewed 41 library staff from 21 institutions in Australia, New Zealand, Canada and the US to talk about respectful and inclusive description:Of those interviewed, no one felt they were doing an adequate job of outreach to communities. Several people weren’t even sure how to go about connecting with stakeholder communities. Some brought up the possibility of working with a campus or community-based Indigenous center. These organizations can be a locus for making connections and holding conversations. A few working in a university setting have found strong allies and partners to advocate for increased resources in faculty members of the Indigenous Studies department (or similar unit). Those with the most developed outreach efforts saw those activities as being anchored in exchanges that originated at the reference desk, such as when tribal members came into the library to learn something about their own history, language or culture using materials that are stewarded by the library. Engaging with these communities to understand needs offers the opportunity to transform interactions with them from one-time transactions to ongoing, meaningful relationships.  Learning how Indigenous community members use and relate to these materials can decenter default approaches to description and inspire more culturally appropriate ways. Some institutions have developed fellowships to foster increased use of materials by community members. // Towards respectful and inclusive descriptionThe murder of George Floyd in the US caused a general personal, institutional and community reckoning with racism. This has extended to addressing bias in library collections and descriptions. Materials in our collections, which comprise a part of the cultural and historical record, may depict offensive and objectionable perspectives, imagery, and norms. While we have control over description of our collections, we cannot alter the content. We are committed to reassessing and modifying description when possible so that it more accurately and transparently reflects the content of materials that are harmful and triggering in any way and for any reason.As librarians and archivists at NYU Libraries, we are actively confronting and remediating how we describe our collection materials. We know that language can and does perpetuate harm, and the work we are undertaking centers on anti-oppressive practices. We are also making reparative changes, all to ensure that the descriptive language we use upholds and enacts our values of inclusion, diversity, equity, belonging, and accessibility. // NYU Libraries, Archival Collections Management, Statement on Harmful Language. Many individual libraries, archives and other organizations are now taking steps to address these issues in their catalogs. The National Library of Scotland has produced an interesting Inclusive Terminology guide and glossary, which includes a list of areas of attention. Of course, the two directions I have mentioned can be connected, as entification and linking strategies may offer ways of pluralizing description in the future:Other magic included technical solutions, such as linked data solutions, specifically, mapping inappropriate terms to more appropriate ones, or connecting the numerous alternative terms with the single concept they represent. For example, preferred names and terms may vary by community, generation, and context—what is considered incorrect or inappropriate may be a matter of perspective. Systems can also play a role: the discovery layers should have a disclaimer stating that users may find terms that are not currently considered appropriate. Terms that are known to be offensive terms could be blurred out (with an option to reveal them). // Towards respectful and inclusive descriptionOCLC InitiativesNow I will turn to discuss one important initiative OCLC has under each of these two directions. Entification at OCLC: towards a shared entity management infrastructureFor linked data to move into common use, libraries need reliable and persistent identifiers and metadata for the critical entities they rely on. This project [SEMI] begins to build that infrastructure and advances the whole field // Lorcan Dempsey OCLC has long worked with entification in the context of several linked data initiatives. VIAF (the Virtual International Authority File) has been foundational here. This brings together name authority files from national libraries around the world, and establishes a singular identity for persons across them. It adds bibliographic and other data for context. And it matches to some other identifiers. VIAF has become an important source of identity data for persons in the library community. Project Passage is also an important landmark. In this project we worked with Wikibase to experiment with linked data and entification at scale. My colleague Andrew Pace provides an overview of the lessons learned:The building blocks of Wikibase can be used to create structured data with a precision that exceeds current library standards.The Wikibase platform enables user-driven ontology design but raises concerns about how to manage and maintain ontologies.The Wikibase platform, supplemented with OCLC’s enhancements and stand-alone utilities, enables librarians to see the results of their effort in a discovery interface without leaving the metadata-creation workflow.Robust tools are required for local data management. To populate knowledge graphs with library metadata, tools that facilitate the import and enhancement of data created elsewhere are recommended.The pilot underscored the need for interoperability between data sources, both for ingest and export.The traditional distinction between authority and bibliographic data disappears in a Wikibase description.These initiatives paved the way for SEMI (Shared Entity Management Infrastructure). Supported by the Andrew W. Mellon Foundation, SEMI is building the infrastructure which will support OCLC&aposs production entity management services. The goal is to have infrastructure which allows libraries to create, manage and use entity data at scale. The initial focus is on providing infrastructure for work and person entities. It is advancing with input from a broad base of partners and a variety of data inputs, and will be released for general use in 2022. Pluralization at OCLC: towards reimagining descriptive workflowsOCLC recognizes its important role in the library metadata environment and has been reviewing its own vocabulary and practices. For example, it has deprecated the term &aposmaster record&apos in favor of &aposWorldCat record.&aposIt was also recognized that there were multiple community and institutional initiatives which were proceeding independently and that there would be value in a convening to discuss shared directions.Accordingly, again supported by the Andrew W Mellon Foundation, OCLC, in consultation with Shift Collective and an advisory group of community leaders, is developing a program to consider these issues at scale. The following activities are being undertaken over several months:Convene a conversation of community stakeholders about how to address the systemic issues of bias and racial equity within our current collection description infrastructure.Share with member libraries the need to build more inclusive and equitable library collections and to provide description approaches that promote effective representation and discovery of previously neglected or mis-characterized peoples, events, and experiences.Develop a community agenda that will be of great value in clarifying issues for those who do knowledge work in libraries, archives, and museums, identifying priority areas for attention from these institutions, and providing valuable guidance for those national agencies and suppliers.It is hoped that the community agenda will help mobilize activity across communities of interest, and will also provide useful input into OCLC development directions.Find out more .. resources to check outThe OCLC Research Library Partners Metadata Managers Focus Group is an important venue for discussion of metadata directions and community needs. This report synthesizes six years (2015-2020) of discussion, and traces how metadata services are evolving:Transitioning to the Next Generation of MetadataOCLCOCLCThis post brings together a series of international discussions about the report and its ramifications for services, staffs and organization. Next-generation metadata and the semantic continuum For updates about OCLC&aposs SEMI initiative, see here:WorldCat - Shared entity management infrastructure | OCLCLearn more about how OCLC is developing a sharedOCLCFor more about about reimagining descriptive workflows, see here:Reimagine Descriptive WorkflowsOCLC has been awarded a grant from The Andrew W. Mellon Foundation to convene a diverse group of experts, practitioners, and community members to determine ways to improve descriptive practices, tools, infrastructure and workflows in libraries and archives. The multi-day virtual convening is part of…OCLCThe Project Passage summary and report is here:Creating Library Linked Data with Wikibase: Lessons Learned from Project PassageThe OCLC Research linked data Wikibase prototype (“Project Passage”) provided a sandbox in which librarians from 16 US institutions could experiment with creating linked data to describe resources—without requiring knowledge of the technical machinery of linked data. This report provides an overview…OCLCOCLCAcknowledgements: Thanks to my colleagues John Chapman, Rachel Frick, Erica Melko, Andrew Pace and Merrilee Proffitt for providing material and/or advice as I prepared the presentation and this entry. Again, thanks to April Manabat of Nazarbayev University for the invitation and for encouragement along the way. For more information about the Conference, check out these pages:Nazarbayev University LibGuides: Eurasian Academic Libraries Conference - 2021: HomeNazarbayev University LibGuides: Eurasian Academic Libraries Conference - 2021: HomeNazarbayev University LibGuides at Nazarbayev UniversityPicture: I took the feature picture at Sydney Airport, Australia (through a window). The Pandemic is affecting how we think about work travel and the design of events, although in as yet unclear ways. One pandemic effect, certainly, has been the ability to think about both audiences and speakers differently. It is unlikely that I would have attended this conference had it been face to face, however, I readily agreed to be an online participant. LITA: Blog Archive The LITA Blog will now function as an archive of posts by the Library and Information Technology Association (LITA), formerly a division of the American Library Association (ALA). LITA is now a part of Core, but was previously the leading organization reaching out across types of libraries to provide education and services for a broad membership of systems librarians, library technologists, library administrators, library schools, vendors, and others interested in leading edge technology and applications for librarians and information providers. Are you interested in becoming involved with Core or the Core News blog? Core News has more information!