Or has it, and it’s just not highly publicized? Five years ago, I was under the impression that most "machine learning" jobs were mostly just data cleaning, linear regression, working with regular data stores, and debugging stuff. Or, that was at least the meme that I heard from a lot of people. That didn’t surprise me at the time. It was easy to imagine that all the fancy research results were fragile, or hard to apply to products, or would at the very least take a long time to adapt. But at this point it’s been quite a few years since there have existed machine learning systems that immensely impressed me. The first such system was probably AlphaGo—all the way back in 2016! AlphaGo then spun off in to multiple better faster cheaper systems that I didn’t even keep track of them. And since then I’ve lost track of the number of unrelated systems that immensely impressed me. And their capabilities are so general that I feel sure that they must be convertible into enormous economic value. I still believe that it takes a long time to boot up a company around novel research results, but I’m not actually well calibrated on how long that takes, and it’s been long enough that it’s starting to feel awkward, like my models are missing something. Here are examples of AI products that I wouldn’t have been surprised if they existed by now, but which I don’t think do. (I can imagine that many of these examples technically exist, but not at the level that I mean).
-
Spotify playlists that are actually just procedurally generated music of various genres
-
A tool that helps researchers/legislators/et cetera by summarizing papers, books, laws on demand
-
Tools that help people (like writers) brainstorm, flesh out ideas by generating further details, asking questions, etc
-
A version of photoshop but with tons of AI tools
-
Widely available self-driving cars
-
Physics simulators that are way faster
-
Paradigmatically different and better web search So what’s the deal? Here’s a list of possible explanations. I’ve love to hear if anyone has evidence for any of them, or if you know of reasons not on the list.
-
The research results are actually not all that applicable to products; more research is needed to refine them
-
They’re way too expensive to run to be profitable
-
Yeah, no, it just takes a really long time to convert innovation into profitable, popular product
-
Something something regulation?
-
The AI companies are deliberately holding back for whatever reason
-
The models are already integrated into the economy and you just don’t know it.
I deny the premise. It’s publicized, you’re just not paying attention to the water in which you swim. Companies like Google and even Apple talk a great deal about how they increasingly employ DL at every layer of the stack. Just for smartphones: pull your smartphone out of your pocket. This is how DL generates economic value: DL affects the chip design, is an increasing fraction of the chips on the SoC, is most of what your camera does, detects your face to unlock it, powers the recommendations of every single thing on it whether Youtube or app store or streaming service (including the news articles and notifications shown to you as you unlock), powers the features like transcripts of calls or machine translation of pages or spam detection that you take for granted, powers the ads which monetize you in the search engine results which they also power, the anti-hacking and anti-abuse measures which keep you safe (and also censor hatespeech etc on streams or social media), the voice synthesis you hear when you talk to it, the voice transcription when you talk to it or have your Zoom/Google videoconference sessions during the pandemic, the wake words, the predictive text when you prefer to type rather than talk and the email suggestions (the whole email, or just the spelling/grammar suggestions), the GNN traffic forecasts changing your Google Maps route to the meeting you emailed about, the cooling systems of the data centers running all of this (not to mention optimizing the placement of the programs within the data centers both spatially in solving the placement problem and temporally in forecasting)...
This all is, of course, in addition to the standard adoption curves & colonization wave dynamics, and merely how far it’s gotten so far.
Comment
I think the conclusion here is probably right, but a lot of the examples seem to exaggerate the role of DL. Like, if I thought all of the obvious-hype-bullshit put forward by big companies about DL were completely true, then it would look like this answer. Starting from the top:
Comment
Stellar breakdown of hype vs. reality. Just wanted to share some news from today that Google has fired an ML scientist for challenging their paper on DL for chip placement. From Engadget (ungated):
Comment
"One story is good until another is told". The chip design work has apparently been replicated, and Metz’s* writeup there has several red flags: in describing Gebru’s departure, he omits any mention of her ultimatum and list of demands, so he’s not above leaving out extremely important context in these departures in trying to build up a narrative of ‘Google fires researchers for criticizing research’; he explicitly notes that Chatterjee was fired ‘for cause’ which is rather eyebrow-raising when usually senior people ‘resign to spend time with their families’ (said nonfirings typically involving things like keeping their stock options while senior people are only ‘fired for cause’ when they’ve really screwed up—like, say, harassment of an attractive young woman) but he doesn’t give what that ‘cause’ was (does he really not know after presumably talking to people?) or wonder why both Chatterjee and Google are withholding it; and he uninterestedly throws in a very brief and selective quote from a presumably much longer statement by a woman involved which should be raising your other eyebrow:
"Sat Chatterjee has waged a campaign of misinformation against me and Azalia for over two years now," Ms. Anna Goldie said in a written statement.
(I note that this is put at the end, which in the NYT house style, is where they bury the inconvenient facts that they can’t in good journalist conscience leave out entirely, and that makes me suspect there is more to this part than is given.)
So, we’ll see. EDIT: Timnit Gebru, perhaps surprisingly, denies any parallel and seems to say Chatterjee deserved to be fired, saying:
Wired has a followup article with more detailed timeline and discussion. It edges much closer to the misogyny narrative than the evil-corporate-censorship narrative.
Comment
Fair enough! Great context, thanks.
Comment
In my experience, not enough people on here publically realise their errors and thank the corrector. Nice to see it happen here.
I don’t think Alex is saying deep learning is valueless, he’s saying the new value generated doesn’t seem commensurate with the scale of the research achievements. Everyone is using algorithmic recommendations, but they don’t feel better than Netflix or Amazon could do 10 years ago. Speech to text is better than it was, but not groundbreakingly so. Predictive text may add value to my life one day, but currently it’s an annoyance.
Maybe the more hidden applications have undergone bigger shifts. I’d love to hear more about deep learning for chip or data center design. But right now the consumer uses feel like modest improvements compounding over time, and I’m constantly frustrated by how unconfigurable tools are becoming.
Comment
I don’t know what you’re talking about. Speech to text actually works now! It was completely unusable just 12 years ago.
Comment
Agreed. I distinctly remember it becoming worth using in 2015, and was using that as my reference point. Since then it’s probably improved, but it’s been gradual enough I haven’t noticed as it happens. Everything Alex cites came after 2015, so I wasn’t counting that as "had major discontinuities in line with the research discontinuities".
However I think foreign language translation has experienced such a discontinuity, and it’s y of comparable magnitude to the wishlist.
Comment
Was circa 2015 speech-to-text using deep learning? If not, how did it work?
Comment
Prior to DL text-to-speech used hidden markov models. Those were replaced with LSTMs relatively early in the DL-revolution (random 2014 paper). In 2015 there were likely still many HHM-based models around, but apparently at least Google already used DL-based text-to-speech.
I would point out that the tech sector is the single most lucrative sector to have invested in in the past decade, despite endless predictions that the tech bubble is finally going to pop, and this techlash or that recession will definitely do it real soon now.
What would the world look like if there were extensive quality improvements in integrated bundles of services behind APIs and SaaS and smartphones driven by, among other things, DL? I submit that it would look like ours looks.
Consumer-obvious stuff is just a small chunk of the economy.
How would you know that? You aren’t Amazon. And when corporations do report lift, the implied revenue gains are pretty big. Even back in 2014 or so, Google could make a business case for dropping $130m on an order of Nvidia GPUs (ch8, Genius Makers), much more for DeepMind, and that was back when DL was mostly ‘just’ image stuff & NMT looking inevitable, well before it began eating the rest of the stack and modalities.
Comment
On tech sector out-performance, I think the more appropriate lookback period started around 2016 when AlphaGo became famous. On predictions, there were also countless many that tech would take over the world. Abundance of predictions for boom or bust is a constant feature of capital markets, and should be given no weight. On causal attribution, note that there have been many other advances in the tech sector, such as cloud computing, mobile computing, industry digitization, Moore’s law, etc. It’s unclear how much of the value added is driven by DL.
Comment
I disagree. Major investments in DL by big tech like FB, Baidu, and Google started well before 2016. I cited that purchase by Google partially to ward off exactly this sort of goalpost moving. And stock markets are forward-looking, so I see no reason to restrict it to AlphaGo (?) actually winning.
Who cares about predictions? Talk is cheap. I’m talking about returns. Stock markets are forward-looking, so if that were really the consensus, they wouldn’t’ve outperformed.
And yet, in worlds where DL delivers huge economic value in consumer-opaque ways all throughout the stack, they look like our world looks.
Comment
I didn’t say ‘final goods or services’. Obviously yes, in the end, everything in the economy exists for the sake of human consumers, there being no one else who it could be for yet (as we don’t care about animals or whatever). I said ‘consumer-obvious’ to refer to what is obvious to consumers, like OP’s complaint.
This is not quite as simple as ‘final’ vs ‘intermediate’ goods. Many of the examples I gave often are final goods, like machine translation. (You, the consumer, punch in a foreign text, get its translation, and go on your merry way.) It’s just that they are upgrades to final goods, which the consumer doesn’t see. If you were paying attention, the rollout of Google Translate from n-grams statistical models to neural machine translation was such a quality jump that people noticed it had happened before Google happened to officially announce it. But if you weren’t paying attention at that particular time in November 2015 or whenever it was, well, Google Translate doesn’t, like, show you little animations of brains chugging away inside TPUs; so you, consumer, stand around like OP going "but why DL???" even as you use Google Translate on a regular basis.
Consumers either never realize these quality improvements happen (perhaps you started using GT after 2015), or they just forget about the pain points they used to endure (cf. my Ordinary Life Improvements essay which is all about that), or they take for granted that ‘line on graph go up’ where everything gets 2% better per year and they never think about the stacked sigmoids and radical underlying changes it must take to keep that steady improvement going.
Yes, I can agree with this. OP is wrong about DL not translating into huge amounts of economic value in excess of the amount invested & yielding profits, because it does, all through the stack, and part of his mistake is in not knowing how many existing things now rely on or plug in DL in some way; but the other part of the mistake is the valid question of "why don’t I see completely brand-new, highly-economically-valuable, things which are blatantly DL, which would satisfy me at a gut level about DL being a revolution?"
Comment
So, why don’t we? I don’t think it’s necessarily any one thing, but a mix of factors that mean it would always be slow to produce these sorts of brand new categories, and others which delay by relatively small time periods and mean that the cool applications we should’ve seen this year got delayed to 2025, say. I would appeal to a mix of:
the future is already here, just unevenly distributed: unfamiliarity with all the things that already do exist (does OP know about DALL-E 2 or 15.ai? OK, fine, does he know about Purplesmart.ai where you could chat with Twilight Sparkle, using face, voice, & text synthesis? Where did you do that before?)
automation-as-colonization-wave dynamics like Shirky’s observations about blogs taking a long time to show up after they were feasible. How long did it take to get brandnew killer apps for ‘electricity’?
Hanson uses the metaphor of a ‘rising tide’; DL can be racing up the spectrum from random to superhuman, but it may not have any noticeable effects until it hits a certain point. Below a certain error rate, things like machine translation or OCR or TTS just aren’t worth bothering with, no matter how impressive they are otherwise or how much progress they represent or how fast they are improving. AlphaGo Fan Hui vs AlphaGo Lee Sedol, GPT-2 vs GPT-3, DALL-E 1 vs DALL-E 2...
Most places are still trying to integrate and invent uses for spreadsheets. Check back in 50 years for a final list of applications of today’s SOTA.
the limitations of tool AI designs: "tool AIs want to be agent AIs" because tools lose a lot of performance and need to go through human bottlenecks, and are inherently molded to existing niches, like hooking an automobile engine up to a buggy. It’ll pull the buggy, sure, but you aren’t going to discover all the other things it could be doing, and it’ll just be a horse which doesn’t poop as much.
exogenous events like
GPU shortages (we would be seeing way more cool applications of just existing models if hobbyists didn’t have to sell a kidney to get a decent Nvidia GPU), which probably lets Nvidia keep prices up (killing tons of DL uses on the margin) and hold back compute progress in favor of dripfeeding
strategic missteps (Intel’s everything, AMD’s decision to ignore Nvidia building up a software ecosystem monopoly & rendering themselves irrelevant to DL, various research orgs ignoring scaling hypothesis work until relatively recently, losing lots of time for R&D cycles)
basic commercial dynamics (hiding stuff behind an API is good business model, but otherwise massively holds back progress),
Marginal cost: We can also note that general tech commercial dynamics like commoditize-your-complement lead to weird, perverse effects because of the valley of death between extremely high-priced services and free services. Like, Google Translate couldn’t roll out NMT using RNNs until they got TPUs. Why? Because a translation has to be almost free before Google can offer it effectively at global scale; and yet, it’s also not worth Google’s time to really try to offer paid APIs because people just don’t want to use them (‘free is different’), it captures little of the value, and Google profits most by creating an integrated ecosystem of services and it’s just not worth bothering doing. And because Google has created ‘a desert of profitability’ around it, it’s hard for any pure-NMT play to work. So you have the very weird ‘overhang’ of NMT in the labs for a long time with ~$0 economic value despite being much better, until suddenly it’s rolled out, but charging $0 each.
Risk aversion/censorship: putting stuff behind an API enables risk aversion and censorship to avoid any PR problems. How ridiculous that you can’t generate faces with DALL-E 2! Or anime!
Have a cool use for LaMDA, Chinchilla/Flamingo, Gopher, or PaLM? Too bad! And big corps can afford the opportunity cost because after all they make so much money already. They’re not going to go bankrupt or anything… So we regularly see researchers leaving GB, OA, or DM, (most recently, Adept AI Labs, with incidentally a really horrifying mission from the perspective of AI safety) and scuttlebutt has it, like Jang reports, that this is often because it’s just such a pain in the ass to get big corps to approve any public use of the most awesome models, that it’s easier to leave for a startup to recreate it from scratch and then deploy it. Or consider AI Dungeon: it used to be one of the best examples of something you just couldn’t do with earlier approaches, but has gone through so many wild change in quality apparently due to the backend and OA issues that I’m too embarrassed to mention it much these days because I have no idea if it’s lobotomized this month or not.
(I have also read repeatedly that exciting new Google projects like Duplex or a Google credit card have been killed by management afraid of any kind of backlash or criticism; in the case of the credit card, apparently DEI advocates brought up the risk of it ‘exacerbating economic inequality’ or something. Plus, remember that whole thing where for like half a year Googlers weren’t allowed to mention the name "LaMDA" even as they were posting half a dozen papers on Arxiv all about it?)
bottlenecks in compute (even ignoring the GPU shortage part) where our reach exceeds our grant-making grasp (we know that much bigger models would do so many cool things, but the big science money continues to flow to things like ITER or LHC)
and in developers/researchers capable of applying DL to all the domains it could be applied to.
(People the other day were getting excited over a new GNN weather-forecaster which apparently beats the s-t out of standard weather forecasting models. Does it? I dunno, I know very little about weather forecasting models and what it might be doing wrong or being exaggerated. Could I believe that one dude did so as a hobby? Absolutely—just how many DL experts do you think there are in weather-forecasting?)
general underdevelopment of approaches making them inefficient in many ways, so you can see the possibility long before the experience curve has cranked away enough times to democratize it (things like Chinchilla show how far even the basics are from being optimized, and are why DL has a steep experience curve)
Applications are a flywheel, and our DL flywheel has an incredible amount of friction in it right now in terms of getting out to a wider world and into the hands of more people empowered to find new uses, rather than passively consuming souped-up services.
To continue the analogy, it’s like if there was a black cab monopoly on buggies which was rich off fares & deliveries and worried about criticism in the London Times for running over old ladies, and automobile engines were still being hand-made one at a time by skilled mechanicks and all the iron & oil was being diverted to manufacture dreadnoughts, so they were slowly replacing horses one at a time with the new iron horses, but only eccentric aristocrats could afford to buy any to try to use elsewhere, which keeps demand low for engines, keeping them expensive and scarce, keeping mechanicks scarce… etc.
The worst part is, for most of these, time lost is gone forever. It’s just a slowdown. Like the Thai floods simply permanently set back hard drive progress and made them expensive for a long time, there was never any ‘catchup growth’ or ‘overhang’ from it. You might hope that stuff like the GPU shortages would lead to so much capital investment and R&D that we’d enjoy a GPU boom in 2023, given historical semiconductor boom-and-bust dynamics, but I’ve yet to see anything hopeful in that vein.
Comment
This is a brilliant comment for understanding the current deployment of DL. Deserves its own post.
(I moved this to answers, since while it isn’t technically an answer, I think it still functions better as an answer than as a comment)
Comment
[I generally approve of mods moving comments to answers.]
Datapoint in favor, Patrick Collison of Stripe says ML has made them $1 billion: https://mobile.twitter.com/patrickc/status/1188890271854915586?lang=en-GB
Comment
Well, merchant revenue, not Stripe profit, so not quite as impressive as it sounds, but it’s a good example of the sort of nitty-gritty DL applications you will never ever hear about unless you are deep into that exact niche and probably an employee; so a good Bayesian will remember that where there is smoke, there is fire and adjust for the fact that you’ll never hear of 99% of uses.
Comment
How are you distinguishing "new DL was instrumental in this process" from "they finally got enough data that existing data janitor techniques worked" or "DL was marginally involved and overall used up more time than it saved, but CEOs are incentivized to give it excess credit"?
It’s totally possible my world is constantly being made more magical in imperceptible ways by deep learning. It’s also possible that magic is improving at a pretty constant rate, disconnected from the flashy research successes, and PR is lying to me about it’s role.
Does anybody know what "optimize the bitfields of card network requests" actually means?
The above answer, partially as bulleted lists.
affects the chip design, is an increasing fraction of the chips on the SoC,
is most of what your camera does,
detects your face to unlock it,
powers the recommendations of every single thing on it whether Youtube or app store or streaming service (including the news articles and notifications shown to you as you unlock), powers the features like
transcripts of calls or
machine translation of pages or
spam detection that you take for granted,
powers the ads which monetize you in the search engine
[search engine] results which they also power,
the anti-hacking and anti-abuse measures which keep you safe
(and also censor hatespeech etc on streams or social media),
the voice synthesis you hear when you talk to it,
the voice transcription when you talk to it or have your Zoom/Google videoconference sessions during the pandemic,
the wake words,
the predictive text when you prefer to type rather than talk
and the email suggestions (the whole email, or just the spelling/grammar suggestions),
the GNN traffic forecasts changing your Google Maps route to the meeting you emailed about,
the cooling systems of the data centers running all of this
(not to mention optimizing the placement of the programs within the data centers both spatially in solving the placement problem and temporally in forecasting)...
Recently I learned that Pixel phones actually contain TPUs. This is a good indicator of how much deep learning is being used (particularly it is used by the camera I think)
My money is mostly on "It just takes a really long time to convert innovation into profitable, popular product" A related puzzle piece IMO: Several years ago, all my friends used f.lux to reduce the amount that computer screens screwed up their circadian rhythm. It had to be manually installed. I was confused/annoyed why Apple didn’t do this automatically. A couple years later, Apple did start doing it automatically (and more recently start shifting everything to darkmode at night) Meanwhile: A couple years ago, we released shortform on LessWrong. There’s a fairly obvious feature missing, which is showing a user’s shortform on their User Profile. That feature is still missing a couple years later. It would take maybe a day to build, and a week to get reviewed and merged into production. There are other obvious missing features we haven’t gotten around to. The reason we haven’t gotten around to it is something like "well, there’s a lot of competing engineering work to do instead, and there’s a bunch of small priorities that make it hard to just set aside a day for doing it". I think Habryka believes this just isn’t the most important thing missing from LW and that keeping the eye on bigger bottlenecks/opportunities is more important. I think Jimrandomh thinks it’s important to make this sort of small feature improvement, but also there’s a bunch of other small feature improvements that need doing (as well as big feature improvements that take up a lot of cognitive attention) There’s also a bit of organization dysfunction, and/or "the cost of information flow and decisionmaking flow is legitimately ‘real’". Something about all this is immensely dissatisfying to me, but it seems like a brute fact about how hard things are. LW is a small team. Apple is a much larger organization that probably pays much higher decisionmaking overhead cost. I think the bridge from "GPT is really impressive" to "GPT successfully summarizes research reports for you" is a much harder problem than adding f.lux to Mac OS or adding shortform to a User Profile. Also, the teams capable of doing it are mostly working on doing the next cool research thing. Also, InstructGPT totally does exist, but each major productization is a lot of gnarly engineering work (and again the people with the depth of understanding to do it are largely busy)
Comment
Note that this is also where some of my "somewhat longer AGI timelines" beliefs come from (i.e 7 years seems more like the minimum to me, whereas I know a couple people listing that as more like a median). It seems to me that most of the pieces of AGI exist already, but that actually getting from here to AGI will require a 2-3 steps, and each step probably turns out to require some annoying engineering steps.
I wonder if there’s also some basic business domain expertise that generalizes here but hasn’t been developed yet. "How to use software to replace humans with spreadsheets" is a piece of domain expertise the SaaS business community has developed to the point where it gets pretty reliably executed. I don’t know that we have widespread knowledge of how to reliably turn models into services/products.
Riffing on the idea that "productionizing a cool research result into a tool/product/feature that a substantial number of users find better than their next best alternative is actually a lot of work": it’s a lot less work in larger organizations with existing users numbering in the millions (or billions). But, as noted, larger orgs have their own overhead. I think this predicts that *most *of the useful products built around deep learning which come out of larger orgs will have certain characteristics, like "is a feature that integrates/enhances an existing product with lots of users" rather than "is a totally new product that was spun up incubator-style within the organization". It plays to the strengths of those orgs—having both datasets and users, playing better with the existing org structure and processes, more incentive-aligned with the people who "make things happen", etc. A couple examples of what I’m thinking of:
substantial improvements in speech recognition—productionized as voice assistant technology, it’s now good enough that it’s sometimes easier to use one than to do something by hand, like setting a timer/alarm/reminder/etc while your hands are occupied with something else,
substantial improvements in image recognition—productionized as image search. I can search for "documents" in Google Photos, and it’ll pull up everything that looks like a document. I can more narrowly search "passport" and it’ll pull up pictures I took of my passport. I can search for "license plate" and it’ll pull up a picture I took of my license plate. I just tried searching for "animal" and it pulled up:
An animated gif of a dog with large glasses on it
Statues of men on horseback, as well as some sculptures of eagles
A bunch of fish in tanks
For structural reasons I’d expect "totally novel, standalone products" to come out of startups rather than larger organizations, but because they’re startups they lack many of the "hard things are easy" buttons that some larger orgs have.
**> My money is mostly on "It just takes a really long time to convert innovation into profitable, popular product"**I’d have gone with—it can take a long time for a society to adapt to a new technology.
Here’s another possible explanation: The models aren’t actually as impressive as they’re made out to be. For example, take DallE2. Yes, it can create amazingly realistic depictions of noun phrases automatically. But can it give you a stylistically coherent drawing based on a paragraph of text? Probably not. Can it draw the same character in three separate scenarios? No, it cannot. DallE2 basically lifts the floor of quality for what you can get for free. But anyone who actually wants or needs the things you can get from a human artist cannot yet get it from an AI. See also, this review of a startup that tries to do data extraction from papers: https://twitter.com/s_r_constantin/status/1518215876201250816
Comment
Meta: I disagree with Alex’s decision to delete Gwern’s comment on this answer. People can reasonably disagree about the optimal balance between ‘more dickish’ (leaves more room for candor, bluntness, and playfulness in discussions) and ‘less dickish’ (encourages calm and a focus on content) in an intellectual community. And on LW, relatively high-karma users like Alex are allowed to moderate discussion of their posts, so Alex is free to promote the balance he thinks is best here. But regardless of where you fall on that spectrum, I think LW should have a soft norm that substance trumps style, content is king, argument will be taken seriously on its own terms even if it’s not optimally packaged and uses the wrong shibboleths or whatever. Deleting substantive, relevant content entirely should mostly not be one of the ‘game moves’ people use in advancing their side of the Dickishness Debate—it’s not worth it on its own terms, it’s not worth it as a punishment for the other side (even if the other side is in fact wrong and you’re right), and it erodes an important thing about LW. Gwern’s comment had tons of content beyond that one sentence that was phrased a bit rudely; and it spawned a bunch of discussion that’s now hard to follow, on a topic that actually matters. Deleting the whole comment, without copy-pasting all or most of it anywhere, seems bad to me.
Comment
I appreciate this comment! I’m interested in responding to you, Rob, because I already know you to be an entirely reasonable person, and also because I think this is somewhat of a continuation of a difference between you and me in real life. I might bail at any time though, because the fact that posters can have their own custom moderation policy means that I don’t feel particularly obligated to justify myself. (For context for the rest of this comment, the line I had a problem with was, "‘noun phrases’ is an odd typo for ‘sentences’. They’re not even close to each other on the keyboard.")
It might be worth to make sure that the author of a deleted comment can still read it so they can repost it on their shortform or a similar place.
Comment
Authors of deleted comments receive the text of the comment in a PM
(These are all quantitative factors. If Gwern’s overall comment had sucked more, or his sentence had been way more egregious, I’d have objected a lot less to Alex’s call. But it does matter where we put rough quantitative thresholds.)
Commenting to note that I agree (though I would put the matter in much stronger terms).
It seems like the applications of DL that have generated useful products so far have been in the areas in which a useful result is easy or economical to verify, safe to test, close to the research itself, and in areas where small failures are inconsequential. Gwern’s list of applications indicates that this lies mostly in the realm of software engineering infrastructure, particularly for consumer products. Unfortunately, it seems that the technologies that would most impress us are not bottlenecked by the fast-and-facile intelligence of a GPT-3. One area that I would have hoped GPT-3 could contribute to would be learning: an automated personal tutor could revolutionize education in a way that MOOCs cannot. Imagine a chatbot with GPT-3′s conversational abilities that could also draw diagrams like DALL-E. Unfortunately, GPT-3 just isn’t reliable enough for that. Actually, it’s still deeply problematic, because its explanations and answers to technical questions seem plausible to a novice, but are incorrect and lack of deep understanding. So it’s currently smart enough to mislead, but not smart enough to educate.
Comment
Seconded. AI is good at approximate answers, and bad at failing gracefully. This makes it very hard to apply to some problems, or requires specialized knowledge/implementation that there isn’t enough expertise or time for.
For most products to be useful, they must be (perhaps not perfectly, but near-perfectly) reliable. A fridge that works 90% of the time is useless, as is a car that breaks down 1 out of every 10 times you try to go to work. The problem with AI is inherently that it’s unreliable—we don’t know how the inner algorithm works, so it just breaks at random points, especially because most of the tasks it handles are really hard (hence why we can’t just use classical algorithms). This makes it really hard to integrate AI until it gets really good, to the point where it can actually be called reliable
The things AI is already used for are things where reliability doesn’t matter as much. Advertisement algorithms just need to be as good as possible to make the company as much revenue as possible. People currently use machine translation just to get the message across and not for formal purposes, making AI algorithms sufficient (if they were better maybe we could use them for more formal purpose’s!). The list goes on.
I honestly think AI won’t become super practical until we reach AGI, at which point (if we ever get there) its usage will explode due to massive applicability and solid reliability (if it doesn’t take over the world, that is).
Comment
For all the hypothetical products I listed, I think this level of unreliability is totally fine! Even self-driving cars only need to beat the reliability of human drivers, which I don’t think is that far from achievable.
Mostly #6 - there is a LOT of deep learning (and other advanced modeling that’s not specifically DNN) out there, but it’s generally for commercial decisions, not as much in consumer products. And rarely is it very visible what mechanisms are being used—that sort of detail is lawsuit-bait.
I think the main thing is that the ML researchers with enough knowledge are in short supply. They are:
doing foundational ai research
being paid megabucks to do the data center cooling ai and the smartphone camera ai
freaking out about AGI
The money and/or lifestyle isn’t in procedural Spotify playlists.
DeepMind have delivered AlphaFold thereby solving a really important outstanding scientific problem. They have used it to generate 3D models of almost every human protein (and then some) which have been released to the community. This is, actually, a huge deal. It will save many many millions in research costs and speed up the generation of new therapeutics.
Comment
The US GDP is 21 trillion. Saving millions of research dollars is a rounding error and not significant economic value. There’s no sign of Eroom’s law stopping and being reversed by discoveries like AlphaFold.
Comment
OK, the question asked for demonstration of economic value now and I grant you AlphaFold, which is solely a research enabler, has not demonstrated that to date. Whether AlphaFold will have a significant role in breaking Eroom’s law is a good question but cannot be answered for at least 10 years. I would still argue that the future economic benefits of what has already been done with AlphaFold and made open access, are likely to be substantial. Consider Alzheimer’s. The current global economic burden is reckoned to be $300 B, p.a. rising in future to $1T. If, say, an Alzheimer’s drug that halved the economic cost, was discovered 5 years earlier on account of AlphaFold the benefit would run to at least $0.75 T in total. This kind of possibility is not unreasonable (for Alzheimer’s replace with your favourite druggable high economic value medical condition)
Comment
It’s unclear to me why we should expect protein-structure prediction to be the bottleneck for finding an Alzheimer cure.
Comment
Not a bottleneck so much as a numbers game. Difficult diseases require many shots on goal to maximise the chance of a successful program. That means trying to go after as many biological targets as there are rationales for, and a variety of different approaches (or chemical series) for each target. Success may even require knocking out two targets in a drug combination approach. You don’t absolutely need protein structures of a target to have a successful drug-design program but using them as a template for molecular design (Structure-Based Drug Design) is a successful and well established approach and and can give rise to alternative chemical series to non-structure based methods. X-ray crystal derived protein structures are the usual way in but if you are unable to generate X-Ray structures, which is still true for many targets, AlphaFold structures can in principle provide the starting point for a program. They can also help generate experimental structures in cases where the X-ray data is difficult to interpret.
Comment
Most of the money spent in developing drugs is not about finding targets but about running clinical studies to validate targets. The time when structure-based drug design became possible did not coincide with drug development getting cheaper.
Comment
I agree with you on both counts. So, I concede, saving millions in research costs may be small beer. But I don’t see that invalidates the argument in my previous comment, which is about getting good drugs discovered as fast as is feasible. Achieving this will still have significant economic and humanitarian benefit even if they are no cheaper to develop. There are worthwhile drugs we have today that we wouldn’t have without Structure-Based Design. The solving of the protein folding problem will also help us to design artificial enzymes and molecular machines. That won‘t be small potatoes either IMO.
AI tech seen in the wild: I’ve been writing C# in MS Visual Studios for the current job, and now have full line AI driven code completion out of the box that I’m finding useful in practice. Much better than anything I’ve seen for smartphones or e.g. gmail text composition. In one instance it correctly infered an entire ~120 character line including the entire anonymous function I was passing into the method call. It won’t do the tricky parts at all, but regardless does wonders for cutting through drudgery and general fatigue. Sure feels like living in the future! VS has had non-AI based completion of next token, for a long time that’s already very good (.NET/C# being strongly typed is a huge boon for these kinds of infernces). I imagine that extra context is why this is performing so much better than general text completion.
Comment
What code completion service are you using? Codex/Copilot?
Comment
Looks like it’s just whatever ships with VS 2022: https://devblogs.microsoft.com/visualstudio/type-less-code-more-with-intellicode-completions/ ; No idea if it’s actually first party, whitelabel/rebranded, or somewhere inbetween. I’d guess it’s GPT3 running on Azure, as Microsoft has licensed the full version to resell on Azure. See also
Let me suggest an alternate answer: there is a lot of resistance to AI coming from the media and the general public. A lot of this is unconscious, so you rarely hear people say "I hate AI" or "I want AI to stop." (You do hear this sometimes, if you listen closely.) This has the consequence that our standards for deploying AI in a consumer-facing way is absurdly high, leading to ML mostly being deployed behind the scenes. That’s why we see a lot of industrial and scientific use of deep learning, as well as some consumer-facing cases in risk-free contexts. (It’s hard to make the case that e.g. speech-to-text is going to kill anyone.)
If safety wasn’t (so much of) an issue, we could have deployed self driving cars as early as the 1990s. As a thought experiment, imagine that 2016-level self driving technology was available to the culture and society of 1900. 1900 was a pivotal year for the automobile, and at that time, our horse-based transportation system was causing a lot of problems for big cities like New York. If you live in a big city today, you might find yourself wondering how it came to be that we live with big, fast, noisy, polluting machines clogging up our cities and denying the streets to pedestrians. Well, horses were a lot worse, and people in 1900 saw the automobile as their savoir. (Read the book Internal Combustion if you want the whole story. Great book.)
The society of 1900, or 1955 for that matter, would have embraced 2016-level self driving with a passion. Good transportation saves lives, so they would not have quibbled about it being slightly less safe than a sober driver or weird edge cases like the car getting stuck sometimes. But the society of 20XX has an extremely high standard for safety (some would say unreasonably high) and there are a lot of people who are afraid of AI, even if they won’t say as much explicitly. It’s a little like nuclear power, where the new vaguely scary technology is resisted by society.
I agree with what Gwern said about things being behind-the-scenes, but it’s also worth noting that there are many impactful consumer technologies that use DL. In fact, some of the things that you don’t think exist actually do exist!
Google Translate: https://www.washingtonpost.com/news/innovations/wp/2016/10/03/google-translate-is-getting-really-really-accurate/
Google Search: https://blog.google/products/search/search-language-understanding-bert/
PhotoShop: https://blog.adobe.com/en/publish/2020/10/20/photoshop-the-worlds-most-advanced-ai-application-for-creatives Examples of other DL-powered consumer applications
Grammarly: https://www.grammarly.com/blog/how-grammarly-uses-ai/
Apple FaceID: https://support.apple.com/en-us/HT208108
JP Morgan Chase: https://www.jpmorgan.com/technology/applied-ai-and-ml
Comment
Google search gets less usable every year, even for Scholar, which has a much less adversarial search space. It’s better for very common searches like popular tv shows, but approaching worthlessness for long tail stuff. Maybe this is just "search is hard", but improving the common case at the cost of the long tail is exactly what I’d expect AI search to do.
Comment
I wonder how we’d go about designing a reward signal for the long-tailed stuff.
Comment
One thing I’d really like to see is reward for diversity of results. Bringing me the same listicle with slight rewrites 10 times provides no value while pushing out better results.A friend of mine doing an ML PhD claims it’s possible to train a search engine to identify the shitty pages that might as well have been written by GPT-3, even if that’s not literally true. I’m skeptical this can be done in a way that keeps up with the adversarial adaptation, but it would be cool if it did.
Comment
Just ran into the listicle problem myself; it effectively slew searching Google for anything where I don’t already know most of what I need. It feels weird that in the name of ad revenue the algorithm promotes junk whose sole purpose is also to generate ad revenue. Process seems to be cannibalizing itself somehow. It would be cool to filter GPT-3-ish things. It seems like we could get most of the diversity without anything very sophisticated; something like negatively weighting hits according to how many other results have similar/very similar content. If all the pages containing some variation of "Top #INT #VERB #NOUN" could get kicked to the bottom of the rankings forever, I’d be a happy camper.
If adversarial adaptation means that shitty pages needs to appear as good pages with solid argumentation, it seems like a win to me.
Elon Musk said a few weeks ago that Tesla’s main strategy right now is to slash the cost of personal transportation by 4x by perfecting full-self-driving AI, and attempting to achieve that this year. (Relatedly, they’re not allocating resources to making an even cheaper version of the Model 3 because it wouldn’t be 4x cheaper anyway.)
Making good on Musk’s claim would probably add another $trillion to Tesla’s market cap in short order.
Comment
Even if tesla’s self-driving technology freezes at its current level, it’s clearly added value to the cars. Maybe not $10,000 per car or whatever they are charging for it, but probably at least $1,000. Multiply that by the million or so cars they sell per year, and that’s a billion dollars of economic value due to recent deep learning advances. Of course, a billion is not a trillion. Plausibly by "significant" the OP meant something more like a trillion.
I work at a large, not quite FAANG company, so I’ll offer my perspective. It’s getting there. Generally, the research results are good, but not as good as they sound in summary. Despite the very real and very concerning progress, most papers you take at face value are a bit hyped. The exceptions to some extent are the large language models. However, not everyone has access to these. The open source versions of them are good but not earth shattering. I think they might be if the goal is to general fluent sounding chatbots, but this is not the goal of most work I am aware of. Companies, at least mine, are hesitant on this because they are worried the bot will say something dumb, racist, or just made-up. Most internet applications are more to do with recommendation, ranking, and classification. In these settings large language models are helping, though they often need to be domain adapted. In those cases they are often only helping +1-2% over well trained classical models, e.g. logistic regression. Still a lot revenue-wise though. They are also big and slow and not suited for every application yet, at least not until the infrastructure (training and serving) catches up. A lot of applications are therefore comfortable iterating on smaller end-to-end trained models, though they are gradually adopting features from large models. They will get there, in time. Progress is also slower in big companies, since (a) you can’t simply plug in somebody’s huggingface model or code and be done with it, (b) there are so many meetings to be had to discuss ‘alignment’ (not that kind) before anything actually gets done. For some of your examples:* procedurally generated music. From what I’ve listened to, the end-to-end generated music is impressive but not impressive enough that I would listen to it for fun. They seem to have little large scale coherence. However this seems like someone could step in and introduce some inductive bias (for example, verse-bridge-chorus repeating song structure), and actually get something good. Maybe they should stick to to instrumental and have a singer-songwriter riff on it. I just don’t think any big name record companies are funding this at the moment, probably they have little institutional AI expertise and think it’s a risk, especially to bring on teams of highly paid engineers.* tools for writers to brainstorm. I think GPT-3 has this as an intended use case? At the moment there are few competitors to make such a large model, so we will see how their pilot users like it.* photoshop with AI tools. That sounds like it should be a thing. Wonder why Adobe hasn’t picked that up (if they haven’t? if it’s still in development?). Could be an institutional thing.* Widely available self driving cars. IMO I think real-world agents are still missing some breakthroughs. That’s one of the last hurdles I think that will be broken to AGI. It’ll happen but I would not be surprised if it is slower than expected.* Physics simulators. Not sure really. I suspect this might be a case of overhyped research papers. Who knows? I actually used to work on this in grad school, using old fashioned finite difference / multistep / RK methods. Usually relying on taylor series coefficients canceling out nicely, or doing gaussian quadrature. On the one hand I can imagine it hard to beat such precisely defined models, but on the other hand, at the end of the day it’s sort of assuming nice properties of functions in a generic way, I can easily imagine a tuned DL stencil doing better for specific domains, e.g. fluids or something. Still, it’s hard to imagine it being a slam dunk rather than an iterative improvement.* Paradigmatically different and better web search. I think we are actually getting there. When I say "hey google", I actually get very real answers to my questions 90% of the time. It’s crazy to me. Kids love it. Though I may be in the minority. I always see reddit threads about people saying that google search has gotten worse. I think there’s a lot of people who are very used to keyword based searches and are not used to the model trying to anticipate them. This will slow adoption since metrics won’t be universally lifted across all users. Also, there’s something to be said for the goodness of old fashioned look up tables. My take on your reasons—they are mostly spot on.
Great list of RL use cases: https://mighty-melody-f4b.notion.site/RL-for-real-world-problems-0114c270e5d94894b3c4f227e24401db
In so far as the answer isn’t what gwern already pointed out, bigger, more visible and ambitious software projects take longer to realize, you’re more likely to hear about failures, and may not be viable until more of the operational kinks get worked out with more managable projects. As much novel stuff as DL has enabled we’re still not quite mature enough that a generalist is wise to pull DL tools into a project that doesn’t clearly require them.
First, the powerful. Then, the rich. Finally, you. The illusion this community provides of an academic scientific establishment begrudgingly beholden to a capitalist economy serving a consumerist society is fake. Faker than the medieval assumption of a clergy serving an aristocracy serving the peasantry. The economy is fake. It’s not hard to predict because it’s complex, it’s hard to predict because it literally doesn’t exist. Reason is but the first step along the staircase of your ability to sense truth. Keep climbing!