<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Sunyan’s Musings]]></title><description><![CDATA[Sunyan’s Musings]]></description><link>https://sunyanlee.com</link><image><url>https://sunyanlee.com/img/substack.png</url><title>Sunyan’s Musings</title><link>https://sunyanlee.com</link></image><generator>Substack</generator><lastBuildDate>Wed, 06 May 2026 10:31:54 GMT</lastBuildDate><atom:link href="https://sunyanlee.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Sunyan]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[sunyan@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[sunyan@substack.com]]></itunes:email><itunes:name><![CDATA[Sunyan]]></itunes:name></itunes:owner><itunes:author><![CDATA[Sunyan]]></itunes:author><googleplay:owner><![CDATA[sunyan@substack.com]]></googleplay:owner><googleplay:email><![CDATA[sunyan@substack.com]]></googleplay:email><googleplay:author><![CDATA[Sunyan]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The Uneven Distribution of AI's Impact]]></title><description><![CDATA[Lately, I&#8217;ve been marinating on William Gibson&#8217;s oft quoted &#8220;the future is already here - it&#8217;s just not evenly distributed.&#8221; Anthropic&#8217;s Economic Index report from last month exemplifies this idea:]]></description><link>https://sunyanlee.com/p/the-uneven-distribution-of-ais-impact</link><guid isPermaLink="false">https://sunyanlee.com/p/the-uneven-distribution-of-ais-impact</guid><dc:creator><![CDATA[Sunyan]]></dc:creator><pubDate>Fri, 21 Mar 2025 00:57:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Hctn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F429a9a94-1e07-45a6-bfaf-6d763bc192d3_2048x1540.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Lately, I&#8217;ve been marinating on William Gibson&#8217;s oft quoted &#8220;the future is already here - it&#8217;s just not evenly distributed.&#8221; <a href="https://www.anthropic.com/news/the-anthropic-economic-index">Anthropic&#8217;s Economic Index report</a> from last month exemplifies this idea:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hctn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F429a9a94-1e07-45a6-bfaf-6d763bc192d3_2048x1540.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hctn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F429a9a94-1e07-45a6-bfaf-6d763bc192d3_2048x1540.png 424w, https://substackcdn.com/image/fetch/$s_!Hctn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F429a9a94-1e07-45a6-bfaf-6d763bc192d3_2048x1540.png 848w, https://substackcdn.com/image/fetch/$s_!Hctn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F429a9a94-1e07-45a6-bfaf-6d763bc192d3_2048x1540.png 1272w, https://substackcdn.com/image/fetch/$s_!Hctn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F429a9a94-1e07-45a6-bfaf-6d763bc192d3_2048x1540.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hctn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F429a9a94-1e07-45a6-bfaf-6d763bc192d3_2048x1540.png" width="1456" height="1095" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/429a9a94-1e07-45a6-bfaf-6d763bc192d3_2048x1540.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1095,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:588974,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://sunyanlee.com/i/158308376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F429a9a94-1e07-45a6-bfaf-6d763bc192d3_2048x1540.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hctn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F429a9a94-1e07-45a6-bfaf-6d763bc192d3_2048x1540.png 424w, https://substackcdn.com/image/fetch/$s_!Hctn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F429a9a94-1e07-45a6-bfaf-6d763bc192d3_2048x1540.png 848w, https://substackcdn.com/image/fetch/$s_!Hctn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F429a9a94-1e07-45a6-bfaf-6d763bc192d3_2048x1540.png 1272w, https://substackcdn.com/image/fetch/$s_!Hctn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F429a9a94-1e07-45a6-bfaf-6d763bc192d3_2048x1540.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">AI Usage by Job Type vs. Job Type Workforce Representation (Anthropic)</figcaption></figure></div><p>The figure shows that computer and mathematical disciplines disproportionately over-index in AI usage (by a factor of &gt;10x relative to representation among US workers). Focusing specifically on white collar jobs, my overarching question is: <strong>If usage is any indication of impact, why does AI appear to have such an uneven impact by industry?</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://sunyanlee.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Sunyan&#8217;s Musings! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p>I&#8217;ll first address two common explanations, before offering my two cents:</p><ul><li><p>The models are not &#8220;smart&#8221; enough for broader tasks</p><ul><li><p>I am doubtful. Without dismissing the limitations of benchmarks, generalist reasoning performance (i.e. MMLU-Pro) and competitive math performance (i.e. AIME 2024) of recent models are already inline or superhuman compared to experts&#8217;. It seems unlikely that these reasoning capabilities are domain specific and non-generalizable:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hcOY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295556e9-2b28-4542-b96e-644f28f413c2_1980x642.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hcOY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295556e9-2b28-4542-b96e-644f28f413c2_1980x642.png 424w, https://substackcdn.com/image/fetch/$s_!hcOY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295556e9-2b28-4542-b96e-644f28f413c2_1980x642.png 848w, https://substackcdn.com/image/fetch/$s_!hcOY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295556e9-2b28-4542-b96e-644f28f413c2_1980x642.png 1272w, https://substackcdn.com/image/fetch/$s_!hcOY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295556e9-2b28-4542-b96e-644f28f413c2_1980x642.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hcOY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295556e9-2b28-4542-b96e-644f28f413c2_1980x642.png" width="664" height="215.25274725274724" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/295556e9-2b28-4542-b96e-644f28f413c2_1980x642.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:472,&quot;width&quot;:1456,&quot;resizeWidth&quot;:664,&quot;bytes&quot;:186256,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sunyanlee.com/i/158308376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295556e9-2b28-4542-b96e-644f28f413c2_1980x642.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hcOY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295556e9-2b28-4542-b96e-644f28f413c2_1980x642.png 424w, https://substackcdn.com/image/fetch/$s_!hcOY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295556e9-2b28-4542-b96e-644f28f413c2_1980x642.png 848w, https://substackcdn.com/image/fetch/$s_!hcOY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295556e9-2b28-4542-b96e-644f28f413c2_1980x642.png 1272w, https://substackcdn.com/image/fetch/$s_!hcOY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F295556e9-2b28-4542-b96e-644f28f413c2_1980x642.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">MMLU-Pro &amp; AIME 2024 Accuracy by Model (Artificial Analysis)</figcaption></figure></div></li></ul></li><li><p>Coders are closest to the space, and therefore earliest to catch on to adoption</p><ul><li><p>This is true. As another confounding factor, Anthropic&#8217;s user base biases to developers. The AI household name is OpenAI&#8217;s ChatGPT, while Anthropic&#8217;s Claude is more known for its coding capabilities.</p></li><li><p>However, this explanation is inadequate. AI is already widely disseminated (i.e. ChatGPT alone has 400M WAU, while all major AI labs offer free tiers with generous model capabilities). Yet, we still haven&#8217;t seen a &#8220;GitHub Copilot moment&#8221; (released in 2021) for most industries, let alone the benefits of more recent advances from the likes of Cursor, Windsurf, and Claude Code.</p></li></ul></li></ul><p></p><p>My hypothesis: Underwhelming impact outside of math and coding is primarily attributable to 1) the <strong>interface mismatch</strong> between AI and common white collar workflows, 2) the <strong>ambiguity of a workflow&#8217;s correctness</strong> and 3) the <strong>real world side effects</strong>:</p><p></p><p><strong>Interface mismatch</strong></p><ul><li><p>The native interface for AI models is the consumption and production of (primarily) text tokens; the models are trained mostly from readable text on the internet and post-trained in the same medium.</p></li><li><p>These text tokens align well with code, the medium of software engineering (indeed, it&#8217;s directly trained on code).</p></li><li><p>By contrast, most white collar workflows occur in the Graphical User Interface (GUI), but <strong>using GUIs is not natural to AI models because there is no organically accruing training data for usage of visual interfaces</strong>. Even moderately complicated workflows span multiple non-standardized applications, further complicating interactions. Moreover, even if GUIs produce textual artifacts (i.e. Excel&#8217;s .xlsx files) like code, these files are idiosyncratic to individual applications and their logic is often opaque (i.e. closed source XML encoding for .xlsx).</p><ul><li><p>This interface mismatch likely explains why Copilot for Microsoft 365 and Gemini for Workspace have lacked traction. In an ideal world, we would have a direct mapping between AI&#8217;s textual tokens and the GUI experience, but the two mediums are fundamentally incompatible. The prevailing approach has therefore been to graft AI capabilities trained on curated internal code specific to an application into the user interface, leading to a subpar experience.</p></li><li><p>One mitigation that may popularize is <strong>&#8220;computer-use models&#8221; trained on collected GUI usage trajectories</strong>, such as OpenAI&#8217;s CUA or Claude 3.7 Sonnet. However, I remain skeptical of result quality from this approach given the relative inefficiency for AI to navigate a visual interface: The <strong>GUI experience is an affordance</strong> designed to make working with computers easier for humans. To force an AI model to abandon a more efficient medium (i.e. textual commands) to work as an overlay over this affordance is regressive.</p></li><li><p>A more promising first step in this direction is Model Context Protocol (MCP), a <strong>standardized API standard for LLMs to work with external tools</strong>, including traditional GUI-based software like Blender and Figma. The results have been impressive. However, behind the scenes the MCP servers are still reliant on a code-based interface necessarily provided by software vendors, such as Blender&#8217;s Python API and Figma&#8217;s Plugin API, respectively:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HJS8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d51e2c7-3c12-4898-a3b3-4f9b0b731d06_1606x858.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HJS8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d51e2c7-3c12-4898-a3b3-4f9b0b731d06_1606x858.png 424w, https://substackcdn.com/image/fetch/$s_!HJS8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d51e2c7-3c12-4898-a3b3-4f9b0b731d06_1606x858.png 848w, https://substackcdn.com/image/fetch/$s_!HJS8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d51e2c7-3c12-4898-a3b3-4f9b0b731d06_1606x858.png 1272w, https://substackcdn.com/image/fetch/$s_!HJS8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d51e2c7-3c12-4898-a3b3-4f9b0b731d06_1606x858.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HJS8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d51e2c7-3c12-4898-a3b3-4f9b0b731d06_1606x858.png" width="1456" height="778" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d51e2c7-3c12-4898-a3b3-4f9b0b731d06_1606x858.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:778,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:511970,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://sunyanlee.com/i/158308376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d51e2c7-3c12-4898-a3b3-4f9b0b731d06_1606x858.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HJS8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d51e2c7-3c12-4898-a3b3-4f9b0b731d06_1606x858.png 424w, https://substackcdn.com/image/fetch/$s_!HJS8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d51e2c7-3c12-4898-a3b3-4f9b0b731d06_1606x858.png 848w, https://substackcdn.com/image/fetch/$s_!HJS8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d51e2c7-3c12-4898-a3b3-4f9b0b731d06_1606x858.png 1272w, https://substackcdn.com/image/fetch/$s_!HJS8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d51e2c7-3c12-4898-a3b3-4f9b0b731d06_1606x858.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Cursor editor using MCP to create a Figma mockup (Sonny Lazuardi)</figcaption></figure></div></li></ul></li></ul><p></p><p><strong>Ambiguity of Correctness</strong></p><ul><li><p>The key driver of AI models&#8217; ability to produce &#8220;intelligent&#8221; answers is our ability to define correctness during training. Reinforcement learning, which underpins state-of-the-art reasoning models, underscores this idea: Given a question and a model response, the training procedure rewards responses that are designated &#8220;correct&#8221; and penalizes responses that are &#8220;wrong&#8221; (in fact, it can be shown that the even the next token prediction task used for pre-training LLMs is itself a special case of reinforcement learning).</p></li><li><p>Clear definitions of correctness align well with coding, wherein we define correctness as the ability for code 1) to compile and 2) to pass unit tests, giving us two avenues to optimize model performance:</p><ul><li><p>Generate large volumes of quality synthetic data by a process known as <strong>rejection sampling</strong>: For any given problem, generate many candidate responses, discarding incorrect responses as verified by our unit tests. By training on this synthetic data, a model can copy only &#8220;good&#8221; coding behavior.</p></li><li><p>Without giving the answer of a problem directly to the model, allow the model to develop the behavior needed to pass our unit tests by trial-and-error through thousands of iterations per problem - <strong>reinforcement learning</strong>.</p></li></ul></li><li><p>On the other hand, correctness is often subjective for white collar tasks. For example, let&#8217;s say that we wish to automate cold emails to potential customers: What is a &#8220;good&#8221; email? There is often no clear objective definition that we can verify using a rule-based metric compared to the coding domain.</p></li></ul><p></p><p><strong>Real World Side Effects</strong></p><ul><li><p>Although correctness is difficult to define for each task, sometimes we can link correctness to a higher level proxy metric, but many workflows here run into the problem of side effects.</p></li><li><p>Continuing with our cold email example, perhaps we can define correctness through prospects&#8217; response rate (i.e. a higher response rate is more correct)? There are two challenges associated with this approach:</p><ul><li><p>Unlike testing for correctness in coding, which can be done in a <strong>sandboxed environment</strong> for millions of trials with no recourse, each trial requires an actual email to be sent to a prospective customer (imagine the negative impact of spamming prospects). In other words, we may encounter unintended real-world consequences for each trial in our hypothetical email workflow, limiting the number of trials that can realistically be done if we use an AI model to optimize a task.</p></li><li><p>Of all the things written in an email, how do you know what increased or decreased conversion? In reinforcement learning, we call this the <strong>credit assignment problem</strong>, and it&#8217;s exacerbated as we limit the number of trials we can run.</p></li></ul></li></ul><p></p><p>I believe that all three problems - 1) <strong>the interface mismatch</strong>, 2) <strong>ambiguity of correctness</strong>, and 3) <strong>real world side effects</strong> - need to be adequately addressed for AI to maximize its impact broadly.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://sunyanlee.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Sunyan&#8217;s Musings! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Economics of Large Language Models ]]></title><description><![CDATA[The Cost of ChatGPT-like Search, Training GPT-3, and a General Framework for Mapping The LLM Cost Trajectory]]></description><link>https://sunyanlee.com/p/the-economics-of-large-language-models</link><guid isPermaLink="false">https://sunyanlee.com/p/the-economics-of-large-language-models</guid><dc:creator><![CDATA[Sunyan]]></dc:creator><pubDate>Sat, 21 Jan 2023 13:58:56 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/8639271d-ea04-4a6b-bc60-728d593ae14d_381x245.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>TLDR</h3><ul><li><p><strong>LLM-powered search is already economically feasible:</strong> As a rough estimate, the cost of performant LLM-powered search is on the order of ~15% of estimated advertising revenue/query today, on top of the existing search cost structure</p></li><li><p><strong>But economically feasible does not mean economically sensible:</strong> The unit economics of LLM-powered search are profitable, but adding this functionality for an existing search engine with $100B+ of search revenue may mean $10B+ of additional costs</p></li><li><p><strong>Other emerging LLM-powered businesses are highly profitable:</strong> Jasper.ai, which generates copywriting with LLMs, likely has SaaS-type (75%+) gross margins</p></li><li><p><strong>Training LLMs (even from scratch) is not cost prohibitive for larger corporations:</strong> Training GPT-3 would only cost ~$1.4M in the public cloud today, and even state-of-the-art models like PaLM would cost only ~$11.2M</p></li><li><p><strong>LLM costs will likely drop significantly:</strong> Training and inference costs for a model with comparable performance to GPT-3 have fallen ~80% since GPT-3&#8217;s release 2.5 years ago</p></li><li><p><strong>Data is the emerging bottleneck for LLM performance:</strong> Increasing model parameter count may yield marginal gains compared to increasing the size of a high-quality training data set</p></li></ul><div><hr></div><h3>Table of Contents</h3><ul><li><p><a href="https://sunyan.substack.com/i/93592286/motivation">Motivation</a></p></li><li><p><a href="https://sunyan.substack.com/i/93592286/a-refresher-on-how-llms-work">A refresher on how LLMs&nbsp;work</a></p></li><li><p><a href="https://sunyan.substack.com/i/93592286/how-much-would-llm-powered-search-cost">How much would LLM-powered search&nbsp;cost?</a></p><ul><li><p><a href="https://sunyan.substack.com/i/93592286/first-order-approximation-foundational-model-apis">First-order approximation</a></p></li><li><p><a href="https://sunyan.substack.com/i/93592286/a-deeper-look-cloud-compute-costs">A deeper&nbsp;look</a></p></li></ul></li><li><p><a href="https://sunyan.substack.com/i/93592286/what-about-training-cost">What about training cost?</a></p></li><li><p><a href="https://sunyan.substack.com/i/93592286/a-general-framework-for-mapping-the-cost-trajectory">A general framework for mapping the cost trajectory</a></p><ul><li><p><a href="https://sunyan.substack.com/i/93592286/parameter-count-efficiencies-the-myth-of-x-bigger-models-every-year">Parameter count efficiencies</a></p></li><li><p><a href="https://sunyan.substack.com/i/93592286/costflop-efficiencies">Cost/FLOP efficiencies</a></p></li><li><p><a href="https://sunyan.substack.com/i/93592286/hardware-utilization-improvements">Hardware utilization improvements</a></p></li></ul></li><li><p><a href="https://sunyan.substack.com/i/93592286/parting-thoughts-llms-are-ready-for-prime-time">Parting thoughts</a></p><div><hr></div></li></ul><h3>Motivation</h3><p>The spectacular performance of large language models (LLMs) has led to widespread speculation on both the emergence of new business models and the disruption of existing ones. Search is one interesting opportunity, given that Google alone grossed $100B+ of revenue from search-related advertising in 2021.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>&nbsp;The viral release of ChatGPT&#8202;&#8212;&#8202;an LLM-powered chatbot producing high-quality answers to search-like queries&#8202;&#8212;&#8202;has prompted many questions on the potential impact on the search landscape, one being the economic feasibility of incorporating LLMs today:</p><ul><li><p>One alleged Google employee suggested on HackerNews that we would need 10x cost reduction before LLM-powered search can be viably deployed<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p></li><li><p>Meanwhile, Microsoft is expected to launch a version of Bing equipped with LLMs by March,<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> and search startups like You.com have already embedded the technology into their products<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a></p></li><li><p>Most recently, the New York Times reported that Google will be unveiling a search engine version with chatbot-like functionality this year<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a></p></li></ul><p>A broader question is: How economically feasible is it to incorporate LLMs into current and new products? In this article, we tease out the cost structure of LLMs today and provide a sense of how it will trend going forward.</p><p></p><h3>A refresher on how LLMs&nbsp;work</h3><p>Although later sections get more technical, we won&#8217;t assume any machine learning familiarity. To level set on what makes LLMs special, we provide a brief refresher.</p><p>Language models predict the likelihood of an output token, given some context:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jjks!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5686f5-e72e-4b40-899b-d83333fc5201_421x222.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jjks!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5686f5-e72e-4b40-899b-d83333fc5201_421x222.png 424w, https://substackcdn.com/image/fetch/$s_!jjks!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5686f5-e72e-4b40-899b-d83333fc5201_421x222.png 848w, https://substackcdn.com/image/fetch/$s_!jjks!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5686f5-e72e-4b40-899b-d83333fc5201_421x222.png 1272w, https://substackcdn.com/image/fetch/$s_!jjks!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5686f5-e72e-4b40-899b-d83333fc5201_421x222.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jjks!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5686f5-e72e-4b40-899b-d83333fc5201_421x222.png" width="421" height="222" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/1f5686f5-e72e-4b40-899b-d83333fc5201_421x222.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:222,&quot;width&quot;:421,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9111,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jjks!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5686f5-e72e-4b40-899b-d83333fc5201_421x222.png 424w, https://substackcdn.com/image/fetch/$s_!jjks!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5686f5-e72e-4b40-899b-d83333fc5201_421x222.png 848w, https://substackcdn.com/image/fetch/$s_!jjks!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5686f5-e72e-4b40-899b-d83333fc5201_421x222.png 1272w, https://substackcdn.com/image/fetch/$s_!jjks!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5686f5-e72e-4b40-899b-d83333fc5201_421x222.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Illustration of an Autoregressive Language Model Input Context and Output</figcaption></figure></div><p><em>(In practice, tokens are generally subwords: i.e. &#8220;happy&#8221; might be broken up as two tokens such as &#8220;hap,&#8221; &#8220;-py&#8221;)</em></p><p>To generate text, language models repeatedly sample new tokens based on the output token probabilities. For example, in a service like ChatGPT, the model begins with an initial prompt that includes the user&#8217;s query as context and generates tokens to construct the response. As each new token is generated, it is appended to the context window to inform the next iteration.</p><p>Language models have existed for decades. What has propelled the performance of the LLMs we know today &#8202;is the implementation through efficient deep neural networks (DNNs) with billions of parameters.&nbsp;The parameters are matrix weights that are used for both training and making predictions, with the number of floating point operations (FLOPs) generally scaling with the parameter count. These operations are computed on processors optimized for matrix operations, such as GPUs (graphics processing units), TPUs (tensor processing units), and other specialized chips. As LLMs grow exponentially larger, these operations demand significantly greater computational resources, which are the underlying driver of LLM costs.</p><p></p><h3>How much would LLM-powered search&nbsp;cost?</h3><p>In this section, we estimate how much it costs to run an LLM-powered search engine. How such a search engine should be implemented remains an area of active research. However, we consider two approaches to assess the cost spectrum to provide such a service:</p><ul><li><p><em>ChatGPT Equivalent</em>: An LLM trained over a vast training dataset, storing knowledge during training into the model parameters. During inferencing (i.e. using the model to generate output), the LLM does not have access to external knowledge.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a></p><ul><li><p>Two key drawbacks are: </p><ol><li><p>This approach is prone to &#8220;hallucinating&#8221; facts</p></li><li><p>The model&#8217;s knowledge is stale, containing only information available up to the last training date</p></li></ol></li></ul></li><li><p><em>2-Stage Search Summarizer</em>: An architecturally similar LLM that can access traditional search engines like Google or Bing at inference time. In the first stage of this approach, we run the query through a search engine to retrieve the top <em>K</em> results. In the second stage, we run each result through the LLM to generate <em>K</em> responses. The model then returns the top-scoring response to the user.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a></p><ul><li><p>This approach improves over the last by:</p><ol><li><p>Being able to cite its sources from the retrieved search results</p></li><li><p>Having access to up-to-date information </p></li></ol></li></ul><p>However, for an LLM of comparable parameter count, this approach suffers from requiring a greater computational cost. The cost of using this approach is also additive to the existing costs of a search engine, given that we piggyback off of existing search results.</p></li></ul><p></p><h4>First-order approximation: foundational model&nbsp;APIs</h4><p>The most direct method of estimating cost is through the list prices of existing foundational model APIs on the market, understanding that the pricing for these services embeds a premium to cost as profit margin to the providers. One representative service is OpenAI, which offers text generation as a service based on LLMs.</p><p>OpenAI&#8217;s <em>Davinci</em> API, powered by the 175B parameter version of GPT-3, has the same parameter count as the GPT-3.5 model that powers ChatGPT.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a> Inferencing from this model today costs ~$0.02/750 words ($0.02/1000 tokens, where 1000 tokens correspond to ~750 words); the total number of words used to calculate pricing comprises both the input and output.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KB8A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6983f8-fcf4-435c-b4a9-d20df687e174_230x101.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KB8A!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6983f8-fcf4-435c-b4a9-d20df687e174_230x101.png 424w, https://substackcdn.com/image/fetch/$s_!KB8A!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6983f8-fcf4-435c-b4a9-d20df687e174_230x101.png 848w, https://substackcdn.com/image/fetch/$s_!KB8A!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6983f8-fcf4-435c-b4a9-d20df687e174_230x101.png 1272w, https://substackcdn.com/image/fetch/$s_!KB8A!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6983f8-fcf4-435c-b4a9-d20df687e174_230x101.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KB8A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6983f8-fcf4-435c-b4a9-d20df687e174_230x101.png" width="230" height="101" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/cd6983f8-fcf4-435c-b4a9-d20df687e174_230x101.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:101,&quot;width&quot;:230,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2024,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KB8A!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6983f8-fcf4-435c-b4a9-d20df687e174_230x101.png 424w, https://substackcdn.com/image/fetch/$s_!KB8A!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6983f8-fcf4-435c-b4a9-d20df687e174_230x101.png 848w, https://substackcdn.com/image/fetch/$s_!KB8A!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6983f8-fcf4-435c-b4a9-d20df687e174_230x101.png 1272w, https://substackcdn.com/image/fetch/$s_!KB8A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd6983f8-fcf4-435c-b4a9-d20df687e174_230x101.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Foundational Model API Pricing by Model Capability (<a href="https://openai.com/api/pricing/">OpenAI</a>)</figcaption></figure></div><p>We make a few simplifying assumptions to arrive at estimates for what we would pay OpenAI for our search service:</p><ul><li><p>In the <em>ChatGPT equivalent </em>implementation, we assume that the service generates a 400-word response against a 50-word prompt, on average. To produce higher-quality results, we also assume the model samples 5 responses per query, picking the best response. Thus:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BNib!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4453afe0-6b5e-4079-b6d8-c628a7caf232_1121x136.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BNib!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4453afe0-6b5e-4079-b6d8-c628a7caf232_1121x136.png 424w, https://substackcdn.com/image/fetch/$s_!BNib!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4453afe0-6b5e-4079-b6d8-c628a7caf232_1121x136.png 848w, https://substackcdn.com/image/fetch/$s_!BNib!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4453afe0-6b5e-4079-b6d8-c628a7caf232_1121x136.png 1272w, https://substackcdn.com/image/fetch/$s_!BNib!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4453afe0-6b5e-4079-b6d8-c628a7caf232_1121x136.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BNib!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4453afe0-6b5e-4079-b6d8-c628a7caf232_1121x136.png" width="898" height="108.94558429973239" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/4453afe0-6b5e-4079-b6d8-c628a7caf232_1121x136.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:136,&quot;width&quot;:1121,&quot;resizeWidth&quot;:898,&quot;bytes&quot;:24628,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BNib!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4453afe0-6b5e-4079-b6d8-c628a7caf232_1121x136.png 424w, https://substackcdn.com/image/fetch/$s_!BNib!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4453afe0-6b5e-4079-b6d8-c628a7caf232_1121x136.png 848w, https://substackcdn.com/image/fetch/$s_!BNib!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4453afe0-6b5e-4079-b6d8-c628a7caf232_1121x136.png 1272w, https://substackcdn.com/image/fetch/$s_!BNib!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F4453afe0-6b5e-4079-b6d8-c628a7caf232_1121x136.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li><li><p>In the <em>2-Stage</em> <em>Search Summarizer</em> implementation, the response generation process is similar. However:</p><ul><li><p>The prompt is significantly longer since it contains both the query and the relevant section from the search result</p></li><li><p>A separate LLM response is generated for each of <em>K</em> search results</p></li></ul><p>Assuming <em>K </em>= 10 and each relevant section from the search result is 1000 words on average:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NxCA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F723e7f22-8949-48ba-8b57-1c3f715b38b5_1352x136.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NxCA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F723e7f22-8949-48ba-8b57-1c3f715b38b5_1352x136.png 424w, https://substackcdn.com/image/fetch/$s_!NxCA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F723e7f22-8949-48ba-8b57-1c3f715b38b5_1352x136.png 848w, https://substackcdn.com/image/fetch/$s_!NxCA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F723e7f22-8949-48ba-8b57-1c3f715b38b5_1352x136.png 1272w, https://substackcdn.com/image/fetch/$s_!NxCA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F723e7f22-8949-48ba-8b57-1c3f715b38b5_1352x136.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NxCA!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F723e7f22-8949-48ba-8b57-1c3f715b38b5_1352x136.png" width="1020" height="102.60355029585799" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/723e7f22-8949-48ba-8b57-1c3f715b38b5_1352x136.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:136,&quot;width&quot;:1352,&quot;resizeWidth&quot;:1020,&quot;bytes&quot;:26467,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NxCA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F723e7f22-8949-48ba-8b57-1c3f715b38b5_1352x136.png 424w, https://substackcdn.com/image/fetch/$s_!NxCA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F723e7f22-8949-48ba-8b57-1c3f715b38b5_1352x136.png 848w, https://substackcdn.com/image/fetch/$s_!NxCA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F723e7f22-8949-48ba-8b57-1c3f715b38b5_1352x136.png 1272w, https://substackcdn.com/image/fetch/$s_!NxCA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F723e7f22-8949-48ba-8b57-1c3f715b38b5_1352x136.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li></ul><p>Assuming a cache hit rate of 30% from optimizations (low-end of Google&#8217;s historical cache hit rate for Search<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-10" href="#footnote-10" target="_self">10</a>) and OpenAI gross margins of 75% (in-line with typical SaaS) on cloud compute cost, our first-order estimate implies:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JArd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fadd25671-c53b-4cfd-aa47-48a3c9ab04ba_651x281.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JArd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fadd25671-c53b-4cfd-aa47-48a3c9ab04ba_651x281.png 424w, https://substackcdn.com/image/fetch/$s_!JArd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fadd25671-c53b-4cfd-aa47-48a3c9ab04ba_651x281.png 848w, https://substackcdn.com/image/fetch/$s_!JArd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fadd25671-c53b-4cfd-aa47-48a3c9ab04ba_651x281.png 1272w, https://substackcdn.com/image/fetch/$s_!JArd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fadd25671-c53b-4cfd-aa47-48a3c9ab04ba_651x281.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JArd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fadd25671-c53b-4cfd-aa47-48a3c9ab04ba_651x281.png" width="651" height="281" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/add25671-c53b-4cfd-aa47-48a3c9ab04ba_651x281.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:281,&quot;width&quot;:651,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17771,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JArd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fadd25671-c53b-4cfd-aa47-48a3c9ab04ba_651x281.png 424w, https://substackcdn.com/image/fetch/$s_!JArd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fadd25671-c53b-4cfd-aa47-48a3c9ab04ba_651x281.png 848w, https://substackcdn.com/image/fetch/$s_!JArd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fadd25671-c53b-4cfd-aa47-48a3c9ab04ba_651x281.png 1272w, https://substackcdn.com/image/fetch/$s_!JArd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fadd25671-c53b-4cfd-aa47-48a3c9ab04ba_651x281.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>By order of magnitude, the estimated cloud compute cost of the <em>ChatGPT Equivalent </em>service at $0.010/query lines up with public commentary:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qCpB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F246b1dcc-9631-46f2-814e-fa6bdfd56acd_755x471.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qCpB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F246b1dcc-9631-46f2-814e-fa6bdfd56acd_755x471.png 424w, https://substackcdn.com/image/fetch/$s_!qCpB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F246b1dcc-9631-46f2-814e-fa6bdfd56acd_755x471.png 848w, https://substackcdn.com/image/fetch/$s_!qCpB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F246b1dcc-9631-46f2-814e-fa6bdfd56acd_755x471.png 1272w, https://substackcdn.com/image/fetch/$s_!qCpB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F246b1dcc-9631-46f2-814e-fa6bdfd56acd_755x471.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qCpB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F246b1dcc-9631-46f2-814e-fa6bdfd56acd_755x471.png" width="448" height="279.4807947019868" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/246b1dcc-9631-46f2-814e-fa6bdfd56acd_755x471.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:471,&quot;width&quot;:755,&quot;resizeWidth&quot;:448,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qCpB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F246b1dcc-9631-46f2-814e-fa6bdfd56acd_755x471.png 424w, https://substackcdn.com/image/fetch/$s_!qCpB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F246b1dcc-9631-46f2-814e-fa6bdfd56acd_755x471.png 848w, https://substackcdn.com/image/fetch/$s_!qCpB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F246b1dcc-9631-46f2-814e-fa6bdfd56acd_755x471.png 1272w, https://substackcdn.com/image/fetch/$s_!qCpB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F246b1dcc-9631-46f2-814e-fa6bdfd56acd_755x471.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">OpenAI CEO Sam Altman on ChatGPT Cost per Chat (<a href="https://twitter.com/sama/status/1599671496636780546?lang=en">Twitter</a>) </figcaption></figure></div><p>In practice, however, the developer of an LLM-powered search engine is more likely to deploy the <em>2-Stage</em> <em>Search Summarizer</em> variant given the aforementioned drawbacks (i.e. hallucinating facts, information staleness) of <em>ChatGPT Equivalent</em>.&nbsp;</p><p>In 2012, Google&#8217;s Head of Search indicated that the search engine processed ~100B searches/month.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-11" href="#footnote-11" target="_self">11</a> From 2012 to 2020, per the World Bank global internet penetration increased from 34% to 60%.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-12" href="#footnote-12" target="_self">12</a> Assuming that search volume grows proportionately, we estimate 2.1T searches/year against ~$100B of search-related revenue<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-13" href="#footnote-13" target="_self">13</a>, arriving at an average revenue of $0.048/query.</p><p>In other words, our estimated cost of $0.066/query is ~1.4x the revenue per query based on the <em>2-Stage</em> <em>Search Summarizer</em> approach. To refine our estimate further:</p><ul><li><p>We anticipate ~4x lower cost through optimizations like 1) quantization (using lower precision data types), 2) knowledge distillation (training a smaller model that learns from the larger one), and 3) training smaller but equally performant &#8220;compute-optimal&#8221; models (discussed in greater detail later)</p></li><li><p>Running the infrastructure in-house vs. relying on a cloud provider offers another ~2x lower cost, assuming ~50% gross margins on cloud computing</p></li></ul><blockquote><p><strong>Net of these reductions, the cost of incorporating performant LLMs in Search is on the order of ~15% of query revenue today (in addition to existing infrastructure costs).</strong> </p></blockquote><p></p><h4>A deeper&nbsp;look: cloud compute costs</h4><p>State-of-the-art LLMs today generally apply a comparable model architecture (most often, <em>decoder-only Transformer models</em>), with the computational cost (in FLOPs) per token during inference equal to <em>~2N,</em> where <em>N</em> is the model parameter count.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-14" href="#footnote-14" target="_self">14</a>&nbsp;</p><p>The Nvidia A100 is currently the most cost-effective GPU option from AWS, and the effective hourly rate of an AWS P4 instance with 8 A100s is $19.22/hour if reserved upfront for 1 year.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-15" href="#footnote-15" target="_self">15</a>  Each A100 delivers a peak 312 TFLOPS (teraFLOPs/second) FP16/FP32 mixed-precision throughput, the key metric for LLM training and inferencing.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-16" href="#footnote-16" target="_self">16</a> FP16/FP32 mixed precision refers to performing operations in 16-bit format (FP16) while storing information in 32-bit format (FP32). Mixed precision allows for higher FLOPS throughput due to the lower overhead of FP16, while maintaining the numerical stability needed for accurate results.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-17" href="#footnote-17" target="_self">17</a></p><p>We assume 21.3% model FLOPS utilization, in-line with GPT-3&#8217;s during training (more recent models have achieved higher efficiency, but utilization remains challenging for low latency inference).<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-18" href="#footnote-18" target="_self">18</a> Thus, for a 175B parameter model like GPT-3:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1eDR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65e8fa8b-3e43-4001-9790-85a1e3be263b_929x180.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1eDR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65e8fa8b-3e43-4001-9790-85a1e3be263b_929x180.png 424w, https://substackcdn.com/image/fetch/$s_!1eDR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65e8fa8b-3e43-4001-9790-85a1e3be263b_929x180.png 848w, https://substackcdn.com/image/fetch/$s_!1eDR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65e8fa8b-3e43-4001-9790-85a1e3be263b_929x180.png 1272w, https://substackcdn.com/image/fetch/$s_!1eDR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65e8fa8b-3e43-4001-9790-85a1e3be263b_929x180.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1eDR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65e8fa8b-3e43-4001-9790-85a1e3be263b_929x180.png" width="929" height="180" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/65e8fa8b-3e43-4001-9790-85a1e3be263b_929x180.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:180,&quot;width&quot;:929,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:29202,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1eDR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65e8fa8b-3e43-4001-9790-85a1e3be263b_929x180.png 424w, https://substackcdn.com/image/fetch/$s_!1eDR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65e8fa8b-3e43-4001-9790-85a1e3be263b_929x180.png 848w, https://substackcdn.com/image/fetch/$s_!1eDR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65e8fa8b-3e43-4001-9790-85a1e3be263b_929x180.png 1272w, https://substackcdn.com/image/fetch/$s_!1eDR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65e8fa8b-3e43-4001-9790-85a1e3be263b_929x180.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><p>We also apply the same calculations based on GCP TPU v4 pricing, with similar results:<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-19" href="#footnote-19" target="_self">19</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CMdK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6a8db-c75d-4828-92c4-18a1c17ea211_602x221.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CMdK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6a8db-c75d-4828-92c4-18a1c17ea211_602x221.png 424w, https://substackcdn.com/image/fetch/$s_!CMdK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6a8db-c75d-4828-92c4-18a1c17ea211_602x221.png 848w, https://substackcdn.com/image/fetch/$s_!CMdK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6a8db-c75d-4828-92c4-18a1c17ea211_602x221.png 1272w, https://substackcdn.com/image/fetch/$s_!CMdK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6a8db-c75d-4828-92c4-18a1c17ea211_602x221.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CMdK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6a8db-c75d-4828-92c4-18a1c17ea211_602x221.png" width="602" height="221" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5f6a8db-c75d-4828-92c4-18a1c17ea211_602x221.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:221,&quot;width&quot;:602,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13308,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CMdK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6a8db-c75d-4828-92c4-18a1c17ea211_602x221.png 424w, https://substackcdn.com/image/fetch/$s_!CMdK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6a8db-c75d-4828-92c4-18a1c17ea211_602x221.png 848w, https://substackcdn.com/image/fetch/$s_!CMdK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6a8db-c75d-4828-92c4-18a1c17ea211_602x221.png 1272w, https://substackcdn.com/image/fetch/$s_!CMdK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f6a8db-c75d-4828-92c4-18a1c17ea211_602x221.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Estimated GPT-3 Inference Cost per 1000 Tokens by Cloud Provider (<a href="https://aws.amazon.com/ec2/instance-types/p4/">AWS</a>, <a href="https://cloud.google.com/products/calculator">GCP</a>)</figcaption></figure></div><p>Our estimated cost of $0.0035/1000 tokens is ~20% of OpenAI&#8217;s API pricing of $0.02/1000 tokens, implying ~80% gross margins assuming that the machines are never idle. This estimate is roughly in-line with our earlier assumption of 75% gross margins, thus offering a sanity check to our <em>ChatGPT Equivalent</em> and <em>2-Stage</em> <em>Search Summarizer</em> search cost estimates.</p><p></p><h3><strong>What about training cost?</strong></h3><p>Another hot topic is what it would cost to train GPT-3 (175B parameters) or more recent LLMs such as Gopher (280B parameters) and PaLM (540B parameters). Our framework for estimating compute cost based on the number of parameters and tokens also applies here, with slight modifications:</p><ul><li><p>Training cost per token is generally ~6<em>N</em> (vs. <em>~2N</em> for inference), where <em>N</em> is the LLM parameter count<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-20" href="#footnote-20" target="_self">20</a></p></li><li><p>We assume model FLOPS utilization of 46.2% during training (vs. 21.3% in inference previously), as was achieved by the 540B parameter PaLM model on TPU v4 chips&nbsp;<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-21" href="#footnote-21" target="_self">21</a></p></li></ul><blockquote><p><strong>GPT-3 has 175B parameters and was trained on 300B tokens.</strong> <strong>Assuming we use GCP TPU v4 chips as Google did with the PaLM model, we estimate the cost of training today as only ~$1.4M.</strong></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5RNK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79f8ff10-d5b2-427e-a3ca-c5da8003bff2_1078x176.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5RNK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79f8ff10-d5b2-427e-a3ca-c5da8003bff2_1078x176.png 424w, https://substackcdn.com/image/fetch/$s_!5RNK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79f8ff10-d5b2-427e-a3ca-c5da8003bff2_1078x176.png 848w, https://substackcdn.com/image/fetch/$s_!5RNK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79f8ff10-d5b2-427e-a3ca-c5da8003bff2_1078x176.png 1272w, https://substackcdn.com/image/fetch/$s_!5RNK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79f8ff10-d5b2-427e-a3ca-c5da8003bff2_1078x176.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5RNK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79f8ff10-d5b2-427e-a3ca-c5da8003bff2_1078x176.png" width="1078" height="176" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79f8ff10-d5b2-427e-a3ca-c5da8003bff2_1078x176.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:176,&quot;width&quot;:1078,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30195,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5RNK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79f8ff10-d5b2-427e-a3ca-c5da8003bff2_1078x176.png 424w, https://substackcdn.com/image/fetch/$s_!5RNK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79f8ff10-d5b2-427e-a3ca-c5da8003bff2_1078x176.png 848w, https://substackcdn.com/image/fetch/$s_!5RNK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79f8ff10-d5b2-427e-a3ca-c5da8003bff2_1078x176.png 1272w, https://substackcdn.com/image/fetch/$s_!5RNK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79f8ff10-d5b2-427e-a3ca-c5da8003bff2_1078x176.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>We can also apply this framework to get a sense of what it would cost to train some of the even larger LLMs:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0ivE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8468480d-6048-4a17-8c2b-734326386335_958x221.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0ivE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8468480d-6048-4a17-8c2b-734326386335_958x221.png 424w, https://substackcdn.com/image/fetch/$s_!0ivE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8468480d-6048-4a17-8c2b-734326386335_958x221.png 848w, https://substackcdn.com/image/fetch/$s_!0ivE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8468480d-6048-4a17-8c2b-734326386335_958x221.png 1272w, https://substackcdn.com/image/fetch/$s_!0ivE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8468480d-6048-4a17-8c2b-734326386335_958x221.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0ivE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8468480d-6048-4a17-8c2b-734326386335_958x221.png" width="958" height="221" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/8468480d-6048-4a17-8c2b-734326386335_958x221.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:221,&quot;width&quot;:958,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21817,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0ivE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8468480d-6048-4a17-8c2b-734326386335_958x221.png 424w, https://substackcdn.com/image/fetch/$s_!0ivE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8468480d-6048-4a17-8c2b-734326386335_958x221.png 848w, https://substackcdn.com/image/fetch/$s_!0ivE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8468480d-6048-4a17-8c2b-734326386335_958x221.png 1272w, https://substackcdn.com/image/fetch/$s_!0ivE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F8468480d-6048-4a17-8c2b-734326386335_958x221.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Estimated Training Cost of LLMs on GCP TPU v4 Chips</figcaption></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://sunyanlee.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption"></p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>A general framework for mapping the cost trajectory</h3><p>We summarize our framework for deriving LLM inference or training cost as follows:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5_LU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe258b5f8-0d01-427f-ae11-3033c65de097_892x255.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5_LU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe258b5f8-0d01-427f-ae11-3033c65de097_892x255.png 424w, https://substackcdn.com/image/fetch/$s_!5_LU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe258b5f8-0d01-427f-ae11-3033c65de097_892x255.png 848w, https://substackcdn.com/image/fetch/$s_!5_LU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe258b5f8-0d01-427f-ae11-3033c65de097_892x255.png 1272w, https://substackcdn.com/image/fetch/$s_!5_LU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe258b5f8-0d01-427f-ae11-3033c65de097_892x255.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5_LU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe258b5f8-0d01-427f-ae11-3033c65de097_892x255.png" width="892" height="255" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/e258b5f8-0d01-427f-ae11-3033c65de097_892x255.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:255,&quot;width&quot;:892,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20362,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5_LU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe258b5f8-0d01-427f-ae11-3033c65de097_892x255.png 424w, https://substackcdn.com/image/fetch/$s_!5_LU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe258b5f8-0d01-427f-ae11-3033c65de097_892x255.png 848w, https://substackcdn.com/image/fetch/$s_!5_LU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe258b5f8-0d01-427f-ae11-3033c65de097_892x255.png 1272w, https://substackcdn.com/image/fetch/$s_!5_LU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe258b5f8-0d01-427f-ae11-3033c65de097_892x255.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Inference &amp; Training Cost of Densely Activated Decoder-Only Transformer LLMs</figcaption></figure></div><p><em>(where &#8220;N&#8221; is the model parameter count and &#8220;processor&#8221; refers to either a TPU, GPU, or another tensor processing accelerator)</em></p><p>It follows that assuming LLM architectures remain similar, the cost of inference and training will change based on the variables above. We&#8217;ll consider each variable in detail, but the key takeaway is the following:</p><blockquote><p><strong>Training or inferencing with a model that is as capable as GPT-3 has gotten &gt;80% cheaper since its release in 2020.</strong></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fh-5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75857cb1-d988-4b4f-abbc-80bc0fd601cd_777x165.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fh-5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75857cb1-d988-4b4f-abbc-80bc0fd601cd_777x165.png 424w, https://substackcdn.com/image/fetch/$s_!fh-5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75857cb1-d988-4b4f-abbc-80bc0fd601cd_777x165.png 848w, https://substackcdn.com/image/fetch/$s_!fh-5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75857cb1-d988-4b4f-abbc-80bc0fd601cd_777x165.png 1272w, https://substackcdn.com/image/fetch/$s_!fh-5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75857cb1-d988-4b4f-abbc-80bc0fd601cd_777x165.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fh-5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75857cb1-d988-4b4f-abbc-80bc0fd601cd_777x165.png" width="777" height="165" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75857cb1-d988-4b4f-abbc-80bc0fd601cd_777x165.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:165,&quot;width&quot;:777,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19009,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fh-5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75857cb1-d988-4b4f-abbc-80bc0fd601cd_777x165.png 424w, https://substackcdn.com/image/fetch/$s_!fh-5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75857cb1-d988-4b4f-abbc-80bc0fd601cd_777x165.png 848w, https://substackcdn.com/image/fetch/$s_!fh-5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75857cb1-d988-4b4f-abbc-80bc0fd601cd_777x165.png 1272w, https://substackcdn.com/image/fetch/$s_!fh-5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75857cb1-d988-4b4f-abbc-80bc0fd601cd_777x165.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Summary of Inference and Training Cost Reductions vs. GPT-3 in 2020 for a Model with Performance Parity</figcaption></figure></div><p></p><h4>Parameter count efficiencies: the myth of 10x bigger models every year</h4><p>One of the common speculations about the next generation of LLMs is the potential for trillion-parameter (densely activated) models, given the exponential parameter growth in the last 5 years: </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bm9H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7001940-57d6-4614-8323-a5a735e03b9e_2779x1189.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bm9H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7001940-57d6-4614-8323-a5a735e03b9e_2779x1189.png 424w, https://substackcdn.com/image/fetch/$s_!Bm9H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7001940-57d6-4614-8323-a5a735e03b9e_2779x1189.png 848w, https://substackcdn.com/image/fetch/$s_!Bm9H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7001940-57d6-4614-8323-a5a735e03b9e_2779x1189.png 1272w, https://substackcdn.com/image/fetch/$s_!Bm9H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7001940-57d6-4614-8323-a5a735e03b9e_2779x1189.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bm9H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7001940-57d6-4614-8323-a5a735e03b9e_2779x1189.png" width="1456" height="623" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f7001940-57d6-4614-8323-a5a735e03b9e_2779x1189.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:623,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:125807,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bm9H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7001940-57d6-4614-8323-a5a735e03b9e_2779x1189.png 424w, https://substackcdn.com/image/fetch/$s_!Bm9H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7001940-57d6-4614-8323-a5a735e03b9e_2779x1189.png 848w, https://substackcdn.com/image/fetch/$s_!Bm9H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7001940-57d6-4614-8323-a5a735e03b9e_2779x1189.png 1272w, https://substackcdn.com/image/fetch/$s_!Bm9H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7001940-57d6-4614-8323-a5a735e03b9e_2779x1189.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Growth of Model Parameter Count in LLMs</figcaption></figure></div><p>LLMs have roughly grown parameter count 10x each year, but many have not varied the size of the training data sets significantly:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ze2m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586f061a-c738-4cf8-8033-3dc23f609891_561x101.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ze2m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586f061a-c738-4cf8-8033-3dc23f609891_561x101.png 424w, https://substackcdn.com/image/fetch/$s_!ze2m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586f061a-c738-4cf8-8033-3dc23f609891_561x101.png 848w, https://substackcdn.com/image/fetch/$s_!ze2m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586f061a-c738-4cf8-8033-3dc23f609891_561x101.png 1272w, https://substackcdn.com/image/fetch/$s_!ze2m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586f061a-c738-4cf8-8033-3dc23f609891_561x101.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ze2m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586f061a-c738-4cf8-8033-3dc23f609891_561x101.png" width="561" height="101" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/586f061a-c738-4cf8-8033-3dc23f609891_561x101.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:101,&quot;width&quot;:561,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6560,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ze2m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586f061a-c738-4cf8-8033-3dc23f609891_561x101.png 424w, https://substackcdn.com/image/fetch/$s_!ze2m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586f061a-c738-4cf8-8033-3dc23f609891_561x101.png 848w, https://substackcdn.com/image/fetch/$s_!ze2m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586f061a-c738-4cf8-8033-3dc23f609891_561x101.png 1272w, https://substackcdn.com/image/fetch/$s_!ze2m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586f061a-c738-4cf8-8033-3dc23f609891_561x101.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Number of Model Parameters vs. Training Tokens in Select LLMs (<a href="https://arxiv.org/abs/2203.15556">Training Compute-Optimal Large Language Models</a>)</figcaption></figure></div><p>However, more recent literature suggests that the focus on scaling parameter count has not been the best way to maximize performance, given fixed computational resources and hardware utilization (i.e. to train a &#8220;compute-optimal&#8221; model):</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T7rJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28d2c165-7863-4fdd-8f68-ae7dd3ba0b91_892x234.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T7rJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28d2c165-7863-4fdd-8f68-ae7dd3ba0b91_892x234.png 424w, https://substackcdn.com/image/fetch/$s_!T7rJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28d2c165-7863-4fdd-8f68-ae7dd3ba0b91_892x234.png 848w, https://substackcdn.com/image/fetch/$s_!T7rJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28d2c165-7863-4fdd-8f68-ae7dd3ba0b91_892x234.png 1272w, https://substackcdn.com/image/fetch/$s_!T7rJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28d2c165-7863-4fdd-8f68-ae7dd3ba0b91_892x234.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T7rJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28d2c165-7863-4fdd-8f68-ae7dd3ba0b91_892x234.png" width="892" height="234" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/28d2c165-7863-4fdd-8f68-ae7dd3ba0b91_892x234.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:234,&quot;width&quot;:892,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13876,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T7rJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28d2c165-7863-4fdd-8f68-ae7dd3ba0b91_892x234.png 424w, https://substackcdn.com/image/fetch/$s_!T7rJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28d2c165-7863-4fdd-8f68-ae7dd3ba0b91_892x234.png 848w, https://substackcdn.com/image/fetch/$s_!T7rJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28d2c165-7863-4fdd-8f68-ae7dd3ba0b91_892x234.png 1272w, https://substackcdn.com/image/fetch/$s_!T7rJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F28d2c165-7863-4fdd-8f68-ae7dd3ba0b91_892x234.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Fitting a parametric function to their experimental results, Google DeepMind researchers found to minimize the model loss <em>L</em> (i.e. maximize performance) that the number of parameters <em>N</em> and the training token count <em>D</em> should be increased at roughly the same rate:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4iTC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa34e41cc-e4a0-4e3c-9cd3-ab755eeb402b_274x41.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4iTC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa34e41cc-e4a0-4e3c-9cd3-ab755eeb402b_274x41.png 424w, https://substackcdn.com/image/fetch/$s_!4iTC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa34e41cc-e4a0-4e3c-9cd3-ab755eeb402b_274x41.png 848w, https://substackcdn.com/image/fetch/$s_!4iTC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa34e41cc-e4a0-4e3c-9cd3-ab755eeb402b_274x41.png 1272w, https://substackcdn.com/image/fetch/$s_!4iTC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa34e41cc-e4a0-4e3c-9cd3-ab755eeb402b_274x41.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4iTC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa34e41cc-e4a0-4e3c-9cd3-ab755eeb402b_274x41.png" width="274" height="41" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/a34e41cc-e4a0-4e3c-9cd3-ab755eeb402b_274x41.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:41,&quot;width&quot;:274,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4iTC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa34e41cc-e4a0-4e3c-9cd3-ab755eeb402b_274x41.png 424w, https://substackcdn.com/image/fetch/$s_!4iTC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa34e41cc-e4a0-4e3c-9cd3-ab755eeb402b_274x41.png 848w, https://substackcdn.com/image/fetch/$s_!4iTC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa34e41cc-e4a0-4e3c-9cd3-ab755eeb402b_274x41.png 1272w, https://substackcdn.com/image/fetch/$s_!4iTC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fa34e41cc-e4a0-4e3c-9cd3-ab755eeb402b_274x41.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Parametric Function for Model Loss (<a href="https://arxiv.org/abs/2203.15556">Training Compute-Optimal Large Language Models</a>)</figcaption></figure></div><p>The authors also trained a model named Chinchilla (70B parameters) with the same computational resources as Gopher (280B parameters) but on 1.4T tokens instead of 300B tokens, outperforming significantly larger models with the same FLOPs budget and thereby also proving that most LLMs were overcompensating on compute and starved for data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!04Dc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae87c84-101b-407c-812c-26c6dfdff520_598x480.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!04Dc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae87c84-101b-407c-812c-26c6dfdff520_598x480.png 424w, https://substackcdn.com/image/fetch/$s_!04Dc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae87c84-101b-407c-812c-26c6dfdff520_598x480.png 848w, https://substackcdn.com/image/fetch/$s_!04Dc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae87c84-101b-407c-812c-26c6dfdff520_598x480.png 1272w, https://substackcdn.com/image/fetch/$s_!04Dc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae87c84-101b-407c-812c-26c6dfdff520_598x480.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!04Dc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae87c84-101b-407c-812c-26c6dfdff520_598x480.png" width="598" height="480" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/dae87c84-101b-407c-812c-26c6dfdff520_598x480.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:480,&quot;width&quot;:598,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!04Dc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae87c84-101b-407c-812c-26c6dfdff520_598x480.png 424w, https://substackcdn.com/image/fetch/$s_!04Dc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae87c84-101b-407c-812c-26c6dfdff520_598x480.png 848w, https://substackcdn.com/image/fetch/$s_!04Dc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae87c84-101b-407c-812c-26c6dfdff520_598x480.png 1272w, https://substackcdn.com/image/fetch/$s_!04Dc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae87c84-101b-407c-812c-26c6dfdff520_598x480.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Predicted Model Loss by Training Data Size vs. Model Parameters (<a href="https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications">Less Wrong: Chinchilla&#8217;s Wild Implications</a>)</figcaption></figure></div><blockquote><p><strong>With 60% fewer parameters (and thus inference compute requirement) than GPT-3, Chinchilla still easily outperforms the 175B model.</strong></p></blockquote><p>In fact, if we trained a 1T parameter model with the same 300B token dataset as GPT-3, we would still expect such a model to underperform Chinchilla:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fHSE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8e98a6b-ca1b-4ca8-bd3e-212209b43d5f_783x96.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fHSE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8e98a6b-ca1b-4ca8-bd3e-212209b43d5f_783x96.png 424w, https://substackcdn.com/image/fetch/$s_!fHSE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8e98a6b-ca1b-4ca8-bd3e-212209b43d5f_783x96.png 848w, https://substackcdn.com/image/fetch/$s_!fHSE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8e98a6b-ca1b-4ca8-bd3e-212209b43d5f_783x96.png 1272w, https://substackcdn.com/image/fetch/$s_!fHSE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8e98a6b-ca1b-4ca8-bd3e-212209b43d5f_783x96.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fHSE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8e98a6b-ca1b-4ca8-bd3e-212209b43d5f_783x96.png" width="783" height="96" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f8e98a6b-ca1b-4ca8-bd3e-212209b43d5f_783x96.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:96,&quot;width&quot;:783,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fHSE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8e98a6b-ca1b-4ca8-bd3e-212209b43d5f_783x96.png 424w, https://substackcdn.com/image/fetch/$s_!fHSE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8e98a6b-ca1b-4ca8-bd3e-212209b43d5f_783x96.png 848w, https://substackcdn.com/image/fetch/$s_!fHSE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8e98a6b-ca1b-4ca8-bd3e-212209b43d5f_783x96.png 1272w, https://substackcdn.com/image/fetch/$s_!fHSE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8e98a6b-ca1b-4ca8-bd3e-212209b43d5f_783x96.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The relative magnitudes of the respective loss terms for the 1T parameter model (0.03 model parameter loss vs. 0.25 training token loss) also suggest that the marginal benefit from increasing the model size is lower than from increasing data volume.</p><blockquote><p><strong>Going forward, much more performance can be gained by diverting incremental computational resources to train on larger datasets of comparable quality than to scale up model parameter count.</strong></p></blockquote><p></p><h4>Cost/FLOP efficiencies</h4><p>For LLM training, the most important hardware performance metric is realizable mixed-precision FP16/FP32 FLOPS. Hardware improvements have been aimed at minimizing cost while maximizing 1) peak FLOPS throughput and 2) model FLOPS utilization. Although both areas are intertwined in hardware development, to keep our analysis simple we will focus on throughput here and discuss utilization in the next section.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Kg3b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66974433-54ce-4bd7-a899-cdcb4e979d1d_892x354.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Kg3b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66974433-54ce-4bd7-a899-cdcb4e979d1d_892x354.png 424w, https://substackcdn.com/image/fetch/$s_!Kg3b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66974433-54ce-4bd7-a899-cdcb4e979d1d_892x354.png 848w, https://substackcdn.com/image/fetch/$s_!Kg3b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66974433-54ce-4bd7-a899-cdcb4e979d1d_892x354.png 1272w, https://substackcdn.com/image/fetch/$s_!Kg3b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66974433-54ce-4bd7-a899-cdcb4e979d1d_892x354.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Kg3b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66974433-54ce-4bd7-a899-cdcb4e979d1d_892x354.png" width="892" height="354" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/66974433-54ce-4bd7-a899-cdcb4e979d1d_892x354.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:354,&quot;width&quot;:892,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20756,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Kg3b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66974433-54ce-4bd7-a899-cdcb4e979d1d_892x354.png 424w, https://substackcdn.com/image/fetch/$s_!Kg3b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66974433-54ce-4bd7-a899-cdcb4e979d1d_892x354.png 848w, https://substackcdn.com/image/fetch/$s_!Kg3b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66974433-54ce-4bd7-a899-cdcb4e979d1d_892x354.png 1272w, https://substackcdn.com/image/fetch/$s_!Kg3b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F66974433-54ce-4bd7-a899-cdcb4e979d1d_892x354.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So far, we have approximated Cost/FLOP by looking at cloud instance pricing. To drill down further, we assess the cost of running these machines ourselves, with the primary components being 1) hardware purchase and 2) energy expense. To illustrate, we again go back to GPT-3, which was trained for 14.8 days by OpenAI on 10,000 V100 GPUs in Microsoft Azure<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-22" href="#footnote-22" target="_self">22</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oJRF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcde9bea-b00b-4054-be92-c7b14fea8419_607x201.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oJRF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcde9bea-b00b-4054-be92-c7b14fea8419_607x201.png 424w, https://substackcdn.com/image/fetch/$s_!oJRF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcde9bea-b00b-4054-be92-c7b14fea8419_607x201.png 848w, https://substackcdn.com/image/fetch/$s_!oJRF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcde9bea-b00b-4054-be92-c7b14fea8419_607x201.png 1272w, https://substackcdn.com/image/fetch/$s_!oJRF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcde9bea-b00b-4054-be92-c7b14fea8419_607x201.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oJRF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcde9bea-b00b-4054-be92-c7b14fea8419_607x201.png" width="607" height="201" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/fcde9bea-b00b-4054-be92-c7b14fea8419_607x201.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:201,&quot;width&quot;:607,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14498,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oJRF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcde9bea-b00b-4054-be92-c7b14fea8419_607x201.png 424w, https://substackcdn.com/image/fetch/$s_!oJRF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcde9bea-b00b-4054-be92-c7b14fea8419_607x201.png 848w, https://substackcdn.com/image/fetch/$s_!oJRF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcde9bea-b00b-4054-be92-c7b14fea8419_607x201.png 1272w, https://substackcdn.com/image/fetch/$s_!oJRF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcde9bea-b00b-4054-be92-c7b14fea8419_607x201.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Cost of Training GPT-3 With Nvidia&#8217;s V100 GPU in 2020 (<a href="https://arxiv.org/abs/2104.10350">Carbon Emissions and Large Neural Network Training</a>)</figcaption></figure></div><p>On hardware cost, Huang&#8217;s Law (per Nvidia CEO Jensen Huang in 2018) stated that GPUs were growing 25 times faster than five years ago.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-23" href="#footnote-23" target="_self">23</a> In the context of LLM training, much of this performance boost was driven by the advent of Tensor Cores (in the case of AMD, matrix cores), which have enabled significantly more performant and efficient mixed-precision operations by processing matrices instead of vectors as the computation primitive. Nvidia first introduced Tensor Cores in 2016 with the V100 data center GPUs. Although the improvement is less significant compared to the jump from the initial introduction of tenor cores, each successive generation of Tensor Cores has furthered throughput/$. Today, we are still seeing 50% generation-over-generation throughput/$ improvement (or ~22% per year) for the data center GPUs used to train LLMs:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!chbT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d26cf7-798a-4ea8-9662-beaf8a406635_471x101.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!chbT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d26cf7-798a-4ea8-9662-beaf8a406635_471x101.png 424w, https://substackcdn.com/image/fetch/$s_!chbT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d26cf7-798a-4ea8-9662-beaf8a406635_471x101.png 848w, https://substackcdn.com/image/fetch/$s_!chbT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d26cf7-798a-4ea8-9662-beaf8a406635_471x101.png 1272w, https://substackcdn.com/image/fetch/$s_!chbT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d26cf7-798a-4ea8-9662-beaf8a406635_471x101.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!chbT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d26cf7-798a-4ea8-9662-beaf8a406635_471x101.png" width="471" height="101" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b1d26cf7-798a-4ea8-9662-beaf8a406635_471x101.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:101,&quot;width&quot;:471,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4350,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!chbT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d26cf7-798a-4ea8-9662-beaf8a406635_471x101.png 424w, https://substackcdn.com/image/fetch/$s_!chbT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d26cf7-798a-4ea8-9662-beaf8a406635_471x101.png 848w, https://substackcdn.com/image/fetch/$s_!chbT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d26cf7-798a-4ea8-9662-beaf8a406635_471x101.png 1272w, https://substackcdn.com/image/fetch/$s_!chbT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d26cf7-798a-4ea8-9662-beaf8a406635_471x101.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Data Center GPUs FP16/FP32 Throughput/$ (Nvidia)</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PFak!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e30c7e-de85-4d96-a611-3d248f79dd51_2778x1188.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PFak!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e30c7e-de85-4d96-a611-3d248f79dd51_2778x1188.png 424w, https://substackcdn.com/image/fetch/$s_!PFak!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e30c7e-de85-4d96-a611-3d248f79dd51_2778x1188.png 848w, https://substackcdn.com/image/fetch/$s_!PFak!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e30c7e-de85-4d96-a611-3d248f79dd51_2778x1188.png 1272w, https://substackcdn.com/image/fetch/$s_!PFak!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e30c7e-de85-4d96-a611-3d248f79dd51_2778x1188.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PFak!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e30c7e-de85-4d96-a611-3d248f79dd51_2778x1188.png" width="1456" height="623" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/47e30c7e-de85-4d96-a611-3d248f79dd51_2778x1188.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:623,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75909,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PFak!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e30c7e-de85-4d96-a611-3d248f79dd51_2778x1188.png 424w, https://substackcdn.com/image/fetch/$s_!PFak!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e30c7e-de85-4d96-a611-3d248f79dd51_2778x1188.png 848w, https://substackcdn.com/image/fetch/$s_!PFak!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e30c7e-de85-4d96-a611-3d248f79dd51_2778x1188.png 1272w, https://substackcdn.com/image/fetch/$s_!PFak!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e30c7e-de85-4d96-a611-3d248f79dd51_2778x1188.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Desktop &amp; Data Center GPUs, Throughput/$ by Precision (Nvidia, <a href="https://arxiv.org/abs/2109.05472">Compute and Energy Consumption Trends in Deep Learning Inference</a>)</figcaption></figure></div><p>Energy efficiency is improving even faster. Today, we are seeing 80% generation-over-generation throughput/watt improvement (or 34% per year) for the data center GPUs used to train LLMs:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Jjmq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa933d14c-c85e-4ad2-86a0-30f461a36083_471x101.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Jjmq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa933d14c-c85e-4ad2-86a0-30f461a36083_471x101.png 424w, https://substackcdn.com/image/fetch/$s_!Jjmq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa933d14c-c85e-4ad2-86a0-30f461a36083_471x101.png 848w, https://substackcdn.com/image/fetch/$s_!Jjmq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa933d14c-c85e-4ad2-86a0-30f461a36083_471x101.png 1272w, https://substackcdn.com/image/fetch/$s_!Jjmq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa933d14c-c85e-4ad2-86a0-30f461a36083_471x101.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Jjmq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa933d14c-c85e-4ad2-86a0-30f461a36083_471x101.png" width="471" height="101" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a933d14c-c85e-4ad2-86a0-30f461a36083_471x101.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:101,&quot;width&quot;:471,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7671,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Jjmq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa933d14c-c85e-4ad2-86a0-30f461a36083_471x101.png 424w, https://substackcdn.com/image/fetch/$s_!Jjmq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa933d14c-c85e-4ad2-86a0-30f461a36083_471x101.png 848w, https://substackcdn.com/image/fetch/$s_!Jjmq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa933d14c-c85e-4ad2-86a0-30f461a36083_471x101.png 1272w, https://substackcdn.com/image/fetch/$s_!Jjmq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa933d14c-c85e-4ad2-86a0-30f461a36083_471x101.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Data Center GPUs FP16/FP32 Throughput/watt (Nvidia)</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sf5u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bc49bf-392b-48ce-bc34-22feaa57e215_2779x1188.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sf5u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bc49bf-392b-48ce-bc34-22feaa57e215_2779x1188.png 424w, https://substackcdn.com/image/fetch/$s_!sf5u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bc49bf-392b-48ce-bc34-22feaa57e215_2779x1188.png 848w, https://substackcdn.com/image/fetch/$s_!sf5u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bc49bf-392b-48ce-bc34-22feaa57e215_2779x1188.png 1272w, https://substackcdn.com/image/fetch/$s_!sf5u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bc49bf-392b-48ce-bc34-22feaa57e215_2779x1188.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sf5u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bc49bf-392b-48ce-bc34-22feaa57e215_2779x1188.png" width="1456" height="622" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/05bc49bf-392b-48ce-bc34-22feaa57e215_2779x1188.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:622,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:76184,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sf5u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bc49bf-392b-48ce-bc34-22feaa57e215_2779x1188.png 424w, https://substackcdn.com/image/fetch/$s_!sf5u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bc49bf-392b-48ce-bc34-22feaa57e215_2779x1188.png 848w, https://substackcdn.com/image/fetch/$s_!sf5u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bc49bf-392b-48ce-bc34-22feaa57e215_2779x1188.png 1272w, https://substackcdn.com/image/fetch/$s_!sf5u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bc49bf-392b-48ce-bc34-22feaa57e215_2779x1188.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Desktop &amp; Data Center GPUs Throughput/watt by Precision (Nvidia, <a href="https://arxiv.org/abs/2109.05472">Compute and Energy Consumption Trends in Deep Learning Inference</a>)</figcaption></figure></div><blockquote><p><strong>Based on the improvements from the V100 (with which GPT-3 was trained) to the upcoming H100 alone, we would expect the in-house training cost to be 58% lower ($312k instead of $744k)</strong>.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5sfE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F14774696-c9fa-4409-8d07-ef619b2541f7_607x161.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5sfE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F14774696-c9fa-4409-8d07-ef619b2541f7_607x161.png 424w, https://substackcdn.com/image/fetch/$s_!5sfE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F14774696-c9fa-4409-8d07-ef619b2541f7_607x161.png 848w, https://substackcdn.com/image/fetch/$s_!5sfE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F14774696-c9fa-4409-8d07-ef619b2541f7_607x161.png 1272w, https://substackcdn.com/image/fetch/$s_!5sfE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F14774696-c9fa-4409-8d07-ef619b2541f7_607x161.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5sfE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F14774696-c9fa-4409-8d07-ef619b2541f7_607x161.png" width="607" height="161" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/14774696-c9fa-4409-8d07-ef619b2541f7_607x161.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:161,&quot;width&quot;:607,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12555,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5sfE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F14774696-c9fa-4409-8d07-ef619b2541f7_607x161.png 424w, https://substackcdn.com/image/fetch/$s_!5sfE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F14774696-c9fa-4409-8d07-ef619b2541f7_607x161.png 848w, https://substackcdn.com/image/fetch/$s_!5sfE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F14774696-c9fa-4409-8d07-ef619b2541f7_607x161.png 1272w, https://substackcdn.com/image/fetch/$s_!5sfE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F14774696-c9fa-4409-8d07-ef619b2541f7_607x161.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Cost of Training GPT-3 With Nvidia&#8217;s H100 GPU Today</figcaption></figure></div><p>Going forward, we anticipate continued design innovations to drive discontinuous improvements to both hardware cost and energy efficiency. For example, going from the V100 to A100 GPU Nvidia added sparsity features that further improve throughput by 2x for certain deep learning architectures.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-24" href="#footnote-24" target="_self">24</a> In the H100, the company is adding native support for FP8 data types, which can lead to further throughput improvements when combined with existing techniques like quantization for inference.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-25" href="#footnote-25" target="_self">25</a></p><p>Additionally, we have seen the emergence of TPUs and other specialized chips that fundamentally redesign the chip architecture for deep learning use cases. Google&#8217;s TPU is built on a systolic array architecture that significantly reduces register usage, improving throughput.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-26" href="#footnote-26" target="_self">26</a> As we will see in the next section, many of the recent hardware improvements have been aimed at improving hardware utilization as we scale training and inference to large parameter models.</p><p></p><h4>Hardware utilization improvements</h4><p>One of the major challenges in LLM training has been the need to scale these models beyond a single chip to multiple systems and to the cluster level, due to the significant memory requirements. For context, in a typical LLM training set up the memory required to hold the optimizer states, gradients, and parameters is 20<em>N</em>, where <em>N</em> is the number of model parameters.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-27" href="#footnote-27" target="_self">27</a></p><p>Thus, BERT-Large, one of the early LLMs from 2018 with 340M parameters, required only 6.8GB of memory, easily fitting into a single desktop-class GPU. On the other hand, for a 175B parameter model like GPT-3 the memory requirement translates to 3.5TB. Meanwhile, Nvidia&#8217;s latest data center GPU, the H100, contains only 80GB of high bandwidth memory (HBM), suggesting that at least 44 H100s are required to fit the memory requirements of GPT-3.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-28" href="#footnote-28" target="_self">28</a> Furthermore, GPT-3 required 14.8 days to train even on 10,000 V100 GPUs. Thus, it&#8217;s essential that FLOPS utilization remains high even as we increase the number of chips for training.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KcLk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd68cf2b5-935e-4fd0-a98b-beb68b351069_892x354.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KcLk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd68cf2b5-935e-4fd0-a98b-beb68b351069_892x354.png 424w, https://substackcdn.com/image/fetch/$s_!KcLk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd68cf2b5-935e-4fd0-a98b-beb68b351069_892x354.png 848w, https://substackcdn.com/image/fetch/$s_!KcLk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd68cf2b5-935e-4fd0-a98b-beb68b351069_892x354.png 1272w, https://substackcdn.com/image/fetch/$s_!KcLk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd68cf2b5-935e-4fd0-a98b-beb68b351069_892x354.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KcLk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd68cf2b5-935e-4fd0-a98b-beb68b351069_892x354.png" width="892" height="354" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/d68cf2b5-935e-4fd0-a98b-beb68b351069_892x354.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:354,&quot;width&quot;:892,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20563,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KcLk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd68cf2b5-935e-4fd0-a98b-beb68b351069_892x354.png 424w, https://substackcdn.com/image/fetch/$s_!KcLk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd68cf2b5-935e-4fd0-a98b-beb68b351069_892x354.png 848w, https://substackcdn.com/image/fetch/$s_!KcLk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd68cf2b5-935e-4fd0-a98b-beb68b351069_892x354.png 1272w, https://substackcdn.com/image/fetch/$s_!KcLk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd68cf2b5-935e-4fd0-a98b-beb68b351069_892x354.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The first dimension of hardware utilization is on the single-chip level. When training the GPT-2 model on a single A100 GPU, hardware utilization reached 35.7%.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-29" href="#footnote-29" target="_self">29</a> One of the hardware utilization bottlenecks turns out to be on-chip memory and capacity: Computations in processor cores require repeated access to HBM, and insufficient bandwidth inhibits throughput. Similarly, limited local memory capacity can force more frequent reads from the higher latency HBM, limiting throughput.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-30" href="#footnote-30" target="_self">30</a></p><p>The second dimension of utilization relates to chip-to-chip scaling. LLM training for models like GPT-3 requires partitioning the model and data across many GPUs. Just as bandwidth for on-chip memory can be a bottleneck, the bandwidth for chip-to-chip interconnects can also be a limiting factor. Nvidia&#8217;s NVLink enabled 300GB/s of bandwidth per GPU with the release of V100. This figure increased 2x for the A100.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-31" href="#footnote-31" target="_self">31</a> </p><p>The last dimension of utilization is system-to-system scaling. A single machine holds up to 16 GPUs, so scaling to a larger number of GPUs requires that the interconnects across systems do not bottleneck performance. To this end, Nvidia&#8217;s Infiniband HCAs have increased max bandwidth by 2x in the last 3 years.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-32" href="#footnote-32" target="_self">32</a></p><p>Across the second and third dimensions, the software partitioning strategy is a crucial consideration for effective utilization. Through a combination of model and data parallelism techniques, LLM training at the cluster level for Nvidia chips reached 30.2% model FLOPS utilization with MT-NLG in Jan 2022,<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-33" href="#footnote-33" target="_self">33</a> compared to 21.3% in 2020 with GPT-3. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_hZa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2329c974-8089-4adc-81f7-f969828774f1_566x101.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_hZa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2329c974-8089-4adc-81f7-f969828774f1_566x101.png 424w, https://substackcdn.com/image/fetch/$s_!_hZa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2329c974-8089-4adc-81f7-f969828774f1_566x101.png 848w, https://substackcdn.com/image/fetch/$s_!_hZa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2329c974-8089-4adc-81f7-f969828774f1_566x101.png 1272w, https://substackcdn.com/image/fetch/$s_!_hZa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2329c974-8089-4adc-81f7-f969828774f1_566x101.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_hZa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2329c974-8089-4adc-81f7-f969828774f1_566x101.png" width="566" height="101" data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/2329c974-8089-4adc-81f7-f969828774f1_566x101.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:101,&quot;width&quot;:566,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:10784,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_hZa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2329c974-8089-4adc-81f7-f969828774f1_566x101.png 424w, https://substackcdn.com/image/fetch/$s_!_hZa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2329c974-8089-4adc-81f7-f969828774f1_566x101.png 848w, https://substackcdn.com/image/fetch/$s_!_hZa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2329c974-8089-4adc-81f7-f969828774f1_566x101.png 1272w, https://substackcdn.com/image/fetch/$s_!_hZa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2329c974-8089-4adc-81f7-f969828774f1_566x101.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Model FLOPS Utilization of Select LLMs (<a href="https://arxiv.org/abs/2204.02311">PaLM: Scaling Language Modeling with Pathways</a>)</figcaption></figure></div><p>Specialized hardware like TPUs has achieved even greater efficiency.</p><blockquote><p><strong>Google&#8217;s 540B parameter PaLM model achieved 46.2% model FLOPS utilization on the TPU v4 chips, 2.2x GPT-3&#8217;s training utilization.</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-34" href="#footnote-34" target="_self">34</a></p></blockquote><p>This utilization improvement was fueled both by more efficiently parallelized training (with Google's Pathways ML system) and by the fundamentally different architecture of the TPU itself. The chip's systolic array architecture and the significant local memory density per core reduce the frequency of high-latency global memory reads. </p><p>In a similar vein, we have seen companies like Cerebras, Graphcore, and SambaNova allocate significantly larger amounts of shared memory capacity in-processor. Going forward, we expect other emerging innovations - such as scaling chips to wafer scale for latency reduction/increased bandwidth, or optimizing data access patterns through programmable units - to further push the hardware utilization envelope.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-35" href="#footnote-35" target="_self">35</a></p><p>Other algorithmic improvements have also been important: Nvidia researchers in a May 2022 paper reached 56.0% model FLOPS utilization for subsequent training of MT-NLG, by selectively recomputing activations rather than relying on traditional gradient checkpointing. The experiments were conducted using 280 GPUs (instead of 2,240 in the original case) and without data parallelism, but nonetheless demonstrated a significant performance improvement over the original run with 30.2% model FLOPS utilization.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-36" href="#footnote-36" target="_self">36</a> </p><p></p><h3>Parting thoughts: LLMs are ready for prime time</h3><p>The NYTimes recently reported that Google had declared ChatGPT a &#8220;code red&#8221; for its search business.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-37" href="#footnote-37" target="_self">37</a> From the economic lens, our rough cost estimate that incorporating performant LLMs into search would cost ~15% of query revenue suggests the tech can already be feasibly deployed. However, Google's dominant market position also disincentivizes it from being a first-mover: at $100B+ of search revenue, widespread deployment of the technology would dent profitability by $10B+. On the other hand, it's unsurprising that Microsoft is planning to incorporate LLMs into Bing.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-38" href="#footnote-38" target="_self">38</a> Even though the cost structure is higher than traditional search, LLM-powered search is not loss-making and the company has a significantly lower search engine market share today. As a result, if Microsoft succeeds in taking share from Google the end result would likely still be greater profit dollars, even as serving existing queries becomes more expensive.</p><p>For other products, interestingly LLMs can already be profitably deployed with SaaS-type margins. For example, Jasper.ai, which was recently valued at $1.5B and uses LLMs to generate copywriting, charges ~$82/100K words (the equivalent of ~$1.09/1000 tokens).<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-39" href="#footnote-39" target="_self">39</a> Using OpenAI's <em>Davinci</em> API pricing of $0.02/1000 tokens, gross margins are likely well above 75% even if we sample multiple responses.</p><p>It&#8217;s also surprising that GPT-3 can be trained with only ~$1.4M today in the public cloud, and that the cost of even state-of-the-art models (like PaLM at ~$11.2M) is not prohibitive for larger companies. With training costs dropping &gt;80% over the last 2.5 years for a GPT-3 quality model, training performant LLMs will likely become even more affordable. In other words, training LLMs is not cheap, but it&#8217;s also not a game of significant economies of scale, entailing massive upfront capital spending that gets amortized over years. Rather, the &#8220;Chinchilla&#8221; paper suggests that going forward one of the emerging scarce resources for training LLM is not capital, but the volume of high-quality data, as scaling model parameter count delivers diminishing returns.</p><p></p><p><em>(2/9/23 Edit: Thank you to Nvidia&#8217;s Emanuel Scoullos and Ioana Boier for suggesting the inclusion of &#8220;<a href="https://arxiv.org/abs/2205.05198">Reducing Activation Recomputation in Large Transformer Models</a>&#8221; in our discussion on model FLOPS utilization)</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://sunyanlee.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption"></p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://sunyanlee.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption"></p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><a href="https://www.sec.gov/Archives/edgar/data/1652044/000165204422000019/goog-20211231.htm">Alphabet 2021 10K</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><a href="https://news.ycombinator.com/item?id=33820750">Comparing Google and ChatGPT</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p><a href="https://www.theinformation.com/articles/microsoft-and-openai-working-on-chatgpt-powered-bing-in-challenge-to-google"> Microsoft and OpenAI Working on ChatGPT-Powered Bing in Challenge to Google</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p><a href="https://blog.you.com/introducing-youchat-the-ai-search-assistant-that-lives-in-your-search-engine-eff7badcd655">Introducing YouChat - The AI Search Assistant that Lives in Your Search Engine</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p><a href="https://www.nytimes.com/2023/01/20/technology/google-chatgpt-artificial-intelligence.html">Google Calls In Help From Larry Page and Sergey Brin for A.I. Fight</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p><a href="https://openai.com/blog/chatgpt/">ChatGPT: Optimizing Langauge Models for Dialogue</a></p><p>In practice, ChatGPT also uses RLHF on top of the base 175B parameter language model, but for the sake of simplicity we won&#8217;t consider the reinforcement learning cost </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p><a href="https://arxiv.org/abs/2203.11147">Teaching language models to support answers with verified quotes</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><p><a href="https://openai.com/blog/chatgpt/">ChatGPT: Optimizing Langauge Models for Dialogue</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><p><a href="https://openai.com/api/pricing/">OpenAI Pricing</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-10" href="#footnote-anchor-10" class="footnote-number" contenteditable="false" target="_self">10</a><div class="footnote-content"><p><a href="https://static.googleusercontent.com/media/research.google.com/en//people/jeff/Stanford-DL-Nov-2010.pdf">Building Software Systems at Google and Lessons Learned</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-11" href="#footnote-anchor-11" class="footnote-number" contenteditable="false" target="_self">11</a><div class="footnote-content"><p><a href="https://searchengineland.com/google-search-press-129925">What&#8217;s New With Google Search</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-12" href="#footnote-anchor-12" class="footnote-number" contenteditable="false" target="_self">12</a><div class="footnote-content"><p><a href="https://ourworldindata.org/internet">Our World in Data: Internet</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-13" href="#footnote-anchor-13" class="footnote-number" contenteditable="false" target="_self">13</a><div class="footnote-content"><p><a href="https://www.sec.gov/Archives/edgar/data/1652044/000165204421000010/goog-20201231.htm">Alphabet 2020 10K</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-14" href="#footnote-anchor-14" class="footnote-number" contenteditable="false" target="_self">14</a><div class="footnote-content"><p><a href="https://arxiv.org/abs/2001.08361">Scaling Laws for Neural Language Models</a></p><p>For encoder-decoder models, inference FLOPs is ~<em>N</em> (instead of <em>2N</em> as per decoder-only models)</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-15" href="#footnote-anchor-15" class="footnote-number" contenteditable="false" target="_self">15</a><div class="footnote-content"><p><a href="https://aws.amazon.com/ec2/instance-types/p4/">AWS EC2 P4 Instances</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-16" href="#footnote-anchor-16" class="footnote-number" contenteditable="false" target="_self">16</a><div class="footnote-content"><p><a href="https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf">NVIDIA A100 Tensor Core GPU Architecture</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-17" href="#footnote-anchor-17" class="footnote-number" contenteditable="false" target="_self">17</a><div class="footnote-content"><p><a href="https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html">Mixed precision training</a></p><p>Everything described for FP16/FP32 also applies to BF16/FP32 mixed-precision operations, which are supported with similar throughput on the A100 and other processors</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-18" href="#footnote-anchor-18" class="footnote-number" contenteditable="false" target="_self">18</a><div class="footnote-content"><p><a href="https://arxiv.org/abs/2204.02311">PaLM: Scaling Langauge Modeling with Pathways</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-19" href="#footnote-anchor-19" class="footnote-number" contenteditable="false" target="_self">19</a><div class="footnote-content"><p><a href="https://cloud.google.com/tpu/pricing">Cloud TPU pricing</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-20" href="#footnote-anchor-20" class="footnote-number" contenteditable="false" target="_self">20</a><div class="footnote-content"><p><a href="https://arxiv.org/abs/2001.08361">Scaling Laws for Neural Language Models</a></p><p>For encoder-decoder models, training FLOPS is ~<em>3N</em> (instead of <em>6N</em> as per decoder-only models)</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-21" href="#footnote-anchor-21" class="footnote-number" contenteditable="false" target="_self">21</a><div class="footnote-content"><p><a href="https://arxiv.org/abs/2204.02311">PaLM: Scaling Langauge Modeling with Pathways</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-22" href="#footnote-anchor-22" class="footnote-number" contenteditable="false" target="_self">22</a><div class="footnote-content"><p><a href="https://arxiv.org/abs/2104.10350">Carbon Emissions and Large Neural Network Training</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-23" href="#footnote-anchor-23" class="footnote-number" contenteditable="false" target="_self">23</a><div class="footnote-content"><p><a href="https://www.youtube.com/watch?v=95nphvtVf34">GTC 2018 Keynote with NVIDIA CEO Jensen Huang</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-24" href="#footnote-anchor-24" class="footnote-number" contenteditable="false" target="_self">24</a><div class="footnote-content"><p><a href="https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf">NVIDIA A100 Tensor Core GPU Architecture</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-25" href="#footnote-anchor-25" class="footnote-number" contenteditable="false" target="_self">25</a><div class="footnote-content"><p><a href="https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/">NVIDIA Hopper Architecture In-Depth</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-26" href="#footnote-anchor-26" class="footnote-number" contenteditable="false" target="_self">26</a><div class="footnote-content"><p><a href="https://cloud.google.com/blog/products/ai-machine-learning/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu">An in-depth look at Google&#8217;s first Tensor Processing Unit (TPU)</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-27" href="#footnote-anchor-27" class="footnote-number" contenteditable="false" target="_self">27</a><div class="footnote-content"><p><a href="https://arxiv.org/abs/2201.11990">Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model</a></p><p>Assuming 20 bytes of memory per parameter based on using the Adam optimizer with mixed-precision training</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-28" href="#footnote-anchor-28" class="footnote-number" contenteditable="false" target="_self">28</a><div class="footnote-content"><p><a href="https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/">NVIDIA Hopper Architecture In-Depth</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-29" href="#footnote-anchor-29" class="footnote-number" contenteditable="false" target="_self">29</a><div class="footnote-content"><p><a href="https://developer.nvidia.com/blog/language-modeling-using-megatron-a100-gpu/">State-of-the-Art Language Modeling Using Megatron on the NVIDIA A100 GPU</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-30" href="#footnote-anchor-30" class="footnote-number" contenteditable="false" target="_self">30</a><div class="footnote-content"><p><a href="https://timdettmers.com/2020/09/07/which-gpu-for-deep-learning/">Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-31" href="#footnote-anchor-31" class="footnote-number" contenteditable="false" target="_self">31</a><div class="footnote-content"><p><a href="https://www.nvidia.com/en-us/data-center/nvlink/">NVLink and NVSwitch</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-32" href="#footnote-anchor-32" class="footnote-number" contenteditable="false" target="_self">32</a><div class="footnote-content"><p><a href="https://www.nvidia.com/en-us/networking/infiniband-adapters/">NVIDIA ConnectX InfiniBand Adapters</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-33" href="#footnote-anchor-33" class="footnote-number" contenteditable="false" target="_self">33</a><div class="footnote-content"><p><a href="https://arxiv.org/abs/2204.02311">PaLM: Scaling Langauge Modeling with Pathways</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-34" href="#footnote-anchor-34" class="footnote-number" contenteditable="false" target="_self">34</a><div class="footnote-content"><p><a href="https://arxiv.org/abs/2204.02311">PaLM: Scaling Langauge Modeling with Pathways</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-35" href="#footnote-anchor-35" class="footnote-number" contenteditable="false" target="_self">35</a><div class="footnote-content"><p><a href="https://www.cerebras.net/blog/cerebras-architecture-deep-dive-first-look-inside-the-hw/sw-co-design-for-deep-learning">Cerebras Architecture Deep Dive: First Look Inside the HW/SW Co-Design for Deep Learning</a></p><p><a href="https://docs.graphcore.ai/projects/ipu-overview/en/latest/about_ipu.html">Graphcore IPU Hardware Overview</a></p><p><a href="https://www.servethehome.com/sambanova-sn10-rdu-at-hot-chips-33/">SambaNova SN10 RDU at Hot Chips 33</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-36" href="#footnote-anchor-36" class="footnote-number" contenteditable="false" target="_self">36</a><div class="footnote-content"><p><a href="https://arxiv.org/abs/2205.05198">Reducing Activation Recomputation in Large Transformer Models</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-37" href="#footnote-anchor-37" class="footnote-number" contenteditable="false" target="_self">37</a><div class="footnote-content"><p><a href="https://www.nytimes.com/2022/12/21/technology/ai-chatgpt-google-search.html">A New Chat Bot is a &#8216;Code Red&#8217; for Google&#8217;s Search Business</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-38" href="#footnote-anchor-38" class="footnote-number" contenteditable="false" target="_self">38</a><div class="footnote-content"><p><a href="https://www.theinformation.com/articles/microsoft-and-openai-working-on-chatgpt-powered-bing-in-challenge-to-google">Microsoft and OpenAI Working on ChatGPT-Powered Bing in Challenge to Google</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-39" href="#footnote-anchor-39" class="footnote-number" contenteditable="false" target="_self">39</a><div class="footnote-content"><p><a href="https://www.jasper.ai/pricing">Jasper.ai Pricing</a></p></div></div>]]></content:encoded></item></channel></rss>