<!DOCTYPE html><html lang="en" xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" style="font-size:16px;"><head></head><head><meta charset="utf-8"/><!--[if !mso]><!--><meta http-equiv="X-UA-Compatible" content="IE=edge"/><!--<![endif]--><meta name="viewport" content="width=device-width,initial-scale=1"/><meta name="x-apple-disable-message-reformatting"/><meta name="format-detection" content="telephone=no,address=no,email=no,date=no,url=no"/><meta name="color-scheme" content="light"/><meta name="supported-color-schemes" content="light"/><title>How AI is Learning to Reason: RL Tricks, Policy Optimization, and the New WebWatcher Agent</title><!--[if mso]><xml><o:OfficeDocumentSettings><o:AllowPNG/><o:PixelsPerInch>96</o:PixelsPerInch></o:OfficeDocumentSettings></xml><![endif]--><style> :root { color-scheme: light; supported-color-schemes: light; } body { margin: 0; padding: 0; min-width: 100%!important; -ms-text-size-adjust: 100% !important; -webkit-transform: scale(1) !important; -webkit-text-size-adjust: 100% !important; -webkit-font-smoothing: antialiased !important; } .body { word-wrap: normal; word-spacing:normal; } table.mso { width: 100%; border-collapse: collapse; padding: 0; table-layout: fixed; } img { border: 0; outline: none; } table { mso-table-lspace: 0px; mso-table-rspace: 0px; } td, a, span { mso-line-height-rule: exactly; } #root [x-apple-data-detectors=true], a[x-apple-data-detectors=true], #MessageViewBody a { color: inherit !important; text-decoration: inherit !important; font-size: inherit !important; font-family: inherit !important; font-weight: inherit !important; line-height: inherit !important; } span.MsoHyperlink { color: inherit !important; mso-style-priority: 99 !important; } span.MsoHyperlinkFollowed { color: inherit !important; mso-style-priority: 99 !important; } .a { background-color:#dedede; } .b { background-color:#2a2a2a; } .c { background-color:#ffffff; } .d { background-color:#fff0c8; } .d2 { background-color:#FFFFFF; } .d3 { background-color:#FFFFFF; } h1 a { text-decoration:none;color:#2C81E5;font-style:italic; } h2 a { text-decoration:none;color:#2C81E5;font-style:italic; } h3 a { text-decoration:none;color:#2C81E5;font-style:italic; } h4 a { text-decoration:none;color:#2C81E5;font-style:italic; } h5 a { text-decoration:none;color:#2C81E5;font-style:italic; } h6 a { text-decoration:none;color:#2C81E5;font-style:italic; } h1, h1 a, h2, h2 a, h3, h3 a, h4, h4 a, h5, h5 a, h6, h6 a, ul, li, ol, p, p a { margin: 0;padding: 0; } h1 { font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif;font-weight:700;font-size:28px;color:#2A2A2A;line-height:42px;padding-bottom:4px;padding-top:16px;mso-margin-top-alt:16px;mso-margin-bottom-alt:4px } h2 { font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif;font-weight:700;font-size:24px;color:#2A2A2A;line-height:36px;padding-bottom:4px;padding-top:16px;mso-margin-top-alt:16px;mso-margin-bottom-alt:4px } h3 { font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif;font-weight:400;font-size:20px;color:#2A2A2A;line-height:30px;padding-bottom:4px;padding-top:16px;mso-margin-top-alt:16px;mso-margin-bottom-alt:4px } h4 { font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif;font-weight:400;font-size:18px;color:#2A2A2A;line-height:27px;padding-bottom:4px;padding-top:16px;mso-margin-top-alt:16px;mso-margin-bottom-alt:4px } h5 { font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif;font-weight:400;font-size:16px;color:#2A2A2A;line-height:24px;padding-bottom:4px;padding-top:16px;mso-margin-top-alt:16px;mso-margin-bottom-alt:4px } h6 { font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif;font-weight:400;font-size:14px;color:#2A2A2A;line-height:21px;padding-bottom:4px;padding-top:16px;mso-margin-top-alt:16px;mso-margin-bottom-alt:4px } p { font-family:'Georgia','Times New Roman',serif;font-weight:400;color:#2D2D2D;font-size:16px;line-height:24px;padding-bottom:8px;padding-top:8px;mso-margin-top-alt:8px;mso-margin-bottom-alt:8px; } p a, .e a, ul a, li a, .h a, .h2 a, .h3 a { word-break:break-word;color:#2C81E5 !important;text-decoration:none;font-style:italic; } p a span, .e a span, ul a span, li a span { color: inherit } p .bold { font-weight:bold;color:#2D2D2D; } p span[style*="font-size"] { line-height: 1.6; } .f p { font-size:12px;line-height:15px;color:#2D2D2D;padding:0; } .f p a { color:#2D2D2D !important; } .g p { font-family:'Helvetica',Arial,sans-serif;font-size:14px;line-height:20px;font-weight:normal;margin:0; } .g p a { text-decoration: underline; } .i p { font-family:'Helvetica',Arial,sans-serif;line-height:23px;font-size:15px;color:#2D2D2D; } .i p a { color:#2D2D2D !important; } .i2 p { font-family:'Helvetica',Arial,sans-serif;line-height:23px;font-size:15px;color:#2D2D2D; } .i2 p a { color:#2D2D2D !important; } .i3 p { font-family:'Helvetica',Arial,sans-serif;line-height:43px;font-size:24px;color:#2D2D2D; } .i3 p a { color:#2D2D2D !important; } .h p a { color:#595959 !important; } .h2 p a { color:#595959 !important; } .h3 p a { color:#595959 !important; } .f p a, .i p a, .i2 p a, .i3 p a, .h p a, .h2 p a, .h3 p a { text-decoration:underline; } .j { border-top:3px solid #ffeb2d; } .k p { padding-left:15px;padding-bottom:0px;padding-top:6px;mso-margin-top-alt:6px;mso-margin-bottom-alt:0px;mso-margin-left-alt:15px; } .o { background-color:#FFFFFF;border:1px solid #F1F1F1;border-radius:5px; } .o p { font-family:'Helvetica',Arial,sans-serif;padding:0px;margin:0px; } .l p, .l p a, .l a { font-size:14px;line-height:20px;font-weight: bold;color:#2D2D2D;padding-bottom:6px;mso-margin-bottom-alt:6px;text-decoration:none; } .m p, .m p a { font-size:13px;line-height:18px;font-weight:400;color:#2D2D2D;padding-bottom:6px;mso-margin-bottom-alt:6px;text-decoration:none; } .n p, .n p a { font-size:12px;line-height:17px;font-weight:400;color:#2D2D2D;padding-bottom:6px;mso-margin-bottom-alt:6px;text-decoration:none; } .p { background-color:#FFFFFF;max-width:520px;border:1px solid #E1E8ED;border:1px solid rgba(80, 80, 80, 0.3);border-radius:5px; } .q { font-size:16px;font-family:Helvetica,Roboto,Calibri,sans-serif !important;border:1px solid #e1e8ed;border:1px solid rgba(80, 80, 80, 0.3);border-radius:10px;background-color:#FFFFFF; } .q p { font-size:16px;font-family:system-ui,Helvetica,Roboto,Calibri,sans-serif !important;color:#222222;padding:4px 0; } .r { border:1px solid #E1E8ED !important;border-radius:5px; } .s p { font-size: 14px; line-height: 17px; font-weight: 400; color: #697882; text-decoration: none; } .t p { font-family:'Helvetica',Arial,sans-serif;font-size:12px;line-height:18px;font-weight:400;color:#000000;font-style:italic;padding:4px 0px 0px; } .v { border-radius:10px;border:solid 0px #DFD150;background-color:#2C81E5;font-family:'Open Sans','Segoe UI','Apple SD Gothic Neo','Lucida Grande','Lucida Sans Unicode',sans-serif;color:#FFFFFF; } .v a { text-decoration:none;display:block;color:#FFFFFF; } .w p { font-size:12px;line-height:15px;font-weight:400;color:#FFFFFF; } .w p a { text-decoration: underline !important;color:#FFFFFF !important; } ul { font-family:'Helvetica',Arial,sans-serif;margin:0px 0px 0px 25px !important;padding:0px !important;color:#2D2D2D;line-height:24px;list-style:disc;font-size:16px; } ul > li { font-family:'Helvetica',Arial,sans-serif;margin:10px 0px 0px 0px !important;padding: 0px 0px 0px 0px !important; color: #2D2D2D; list-style:disc; } ol { font-family:'Helvetica',Arial,sans-serif;margin: 0px 0px 0px 25px !important;padding:0px !important;color:#2D2D2D;line-height:24px;list-style:decimal;font-size:16px; } ol > li { font-family:'Helvetica',Arial,sans-serif;margin:10px 0px 0px 0px !important;padding: 0px 0px 0px 0px !important; color: #2D2D2D; } .e h3, .e p, .e span { padding-bottom:0px;padding-top:0px;mso-margin-top-alt:0px;mso-margin-bottom-alt:0px; } .e span, .e li { font-family:'Helvetica',Arial,sans-serif;font-size:16px;color:#2D2D2D;line-height:24px; } .rec { font-family: ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, "Noto Sans", sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol", "Noto Color Emoji" !important; } .rec__button:hover { background-color: #f9fafb !important; } .copyright a {color: inherit !important; text-decoration: none !important; font-size: inherit !important; font-family: inherit !important; font-weight: inherit !important; line-height: inherit !important;} .txt_social p { padding: 0; word-break: break-all; } .table, .table-c, .table-h { border: 1px solid #C0C0C0; } .table-c { padding:5px; background-color:#FFFFFF; } .table-c p { color: #2D2D2D; font-family:'Helvetica',Arial,sans-serif !important;overflow-wrap: break-word; } .table-h { padding:5px; background-color:#F1F1F1; } .table-h p { color: #2A2A2A; font-family:'Trebuchet MS','Lucida Grande',Tahoma,sans-serif !important;overflow-wrap: break-word; } @media only screen and (max-width:667px) { .aa, .w100pc { width: 100% !important; } .bb img { width: 100% !important; height: auto !important; max-width: none !important; } .cc { padding: 0px 8px !important; } .ee { padding-top:10px !important;padding-bottom:10px !important; } .ff ul, .ff ol { margin: 0px 0px 0px 10px !important;padding: 0px !important; } .ff li { margin:10px 0px 0px 10px !important; } .r {height:140px !important;} .s p { font-size:13px !important;line-height:15px !important; } .mob-hide {display:none !important;} .mob-show {display: block !important; width: auto !important; overflow: visible !important; float: none !important; max-height: inherit !important; line-height: inherit !important;} .mob-stack {width:100% !important;display:block !important;} .mob-w-full {width:100% !important;} .mob-block {display:block !important;} .embed-img {padding:0px 0px 12px 0px !important;} .socialShare {padding-top:15px !important;} .rec { padding-left:15px!important;padding-right:15px!important; } .bodyWrapper { padding:7px 4px 7px 4px !important; } .social-mobile {float:left !important;margin-top:10px !important;} } @media screen and (max-width: 480px) { u + .a .gg { width: 100% !important; width: 100vw !important; } .tok-heart { padding-top:75% !important; } .tok-play { padding-top: 250px !important; } } @media screen and (max-width: 320px) { .tok-heart { padding-top:65% !important; } } .u { border: 1px solid #CACACA !important; border-radius: 2px !important; background-color: #ffffff !important; padding: 0px 13px 0px 13px !important; font-family:ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,"Noto Sans",sans-serif !important;font-size: 12px !important; color: #767676 !important; } .u a { text-decoration: none; display: block !important; color: #767676 !important; margin: 0px !important; } .u span, .u img { color: #767676 !important;margin:0px !important; max-height:32px !important;background-color:#ffffff !important; } </style><!--[if mso]><style type="text/css"> h1, h2, h3, h4, h5, h6 {font-family: Arial, sans-serif !important;} body, table, td, p, a, span {font-family: Arial, sans-serif !important;} sup { font-size: 100% !important;vertical-align: .5em !important;mso-text-raise: -1.5% !important;line-height: 0 !important; } ul { margin-left:0px !important; margin-right:10px !important; margin-top:20px !important; margin-bottom:20px !important; } ul li { margin-left: 0px !important; mso-special-format: decimal; } ol { margin-left:0px !important; margin-right:10px !important; margin-top:20px !important; margin-bottom:20px !important; } ol li { margin-left: 0px !important; mso-special-format: decimal; } li.listItem { margin-left:15px !important; margin-top:0px !important; } .paddingDesktop { padding: 10px 0 !important; } .edm_outlooklist { margin-left: -20px !important; } .embedImage { display:none !important; } </style><![endif]--><style> @font-face { font-family: 'Open Sans'; font-style: normal; font-weight: 700; font-display: swap; src: url('https://fonts.gstatic.com/s/opensans/v40/memSYaGs126MiZpBA-UvWbX2vVnXBbObj2OVZyOOSr4dVJWUgsg-1x4gaVIUwaEQbjA.woff2') format('woff2'); } @font-face { font-family: 'Open Sans'; font-style: italic; font-weight: 700; font-display: swap; src: url('https://fonts.googleapis.com/css2?family=Open+Sans:ital,wght@1,700&display=swap') format('woff2'); } </style></head><body class="a" style="margin:0px auto;padding:0px;word-wrap:normal;word-spacing:normal;background-color:#dedede;"><div role="article" aria-roledescription="email" aria-label="email_name" lang="en" style="font-size:1rem"><div style="display:none;max-height:0px;overflow:hidden;"> In this article, We analyze the use of Reinforcement Learning for LLM reasoning, a new policy optimization method for more concise outputs, and the groundbreaking WebWatcher vision-language...  ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ </div><table role="none" width="100%" border="0" cellspacing="0" align="center" cellpadding="0" class="gg"><tr><td align="center" valign="top"><table role="none" width="670" border="0" cellspacing="0" cellpadding="0" class="aa" style="width:670px;table-layout:fixed;"><tr><td class="bodyWrapper" align="center" valign="top" style="padding:7px 7px 7px 7px;"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" align="center"><tr><td align="center" valign="top" style="border-width:0px 0px 0px 0px;border-style: solid; border-color: #2a2a2a;border-radius:10px 10px 0px 0px;background-color:#ffffff;" class="c"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" align="center"><tr id="header"><td style="padding:15px 15px 0px 15px;"><div style="padding-top:0px;padding-right:0px;padding-bottom:20px;padding-left:0px;"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" align="center"><tr><td class="f" align="right" valign="top"><p> August 19, 2025 | <a href="https://elink4f7.mail.bycloud.ai/ss/c/u001.c6q0w4g5sodbtO4I1B_pxSdB5RCIH6yy1Fm1CYma3Ex_Go4rYnclDnvaOYt8uAwh_GO4r9IpT6SIfYLuhgPXsHZ1ay-FOj4b9PdPSpUqGIwSlWgraMtj43WomDtaFAsU78WHUc92mndUdtFfjdD5vr3hcfP2A6cG59iXSIDjI9vZkZoG49s1ihJ-FLHzbVS9S_kQ_t8NZGUyux0zQ3Dyf-an81-oVauMT0HGHgX9f1tWQZwz4emzntrSZHmKHNcy9_on8GM12ibFFvJsOZrCQJOnGC3mBhagYDLs36PSHgJxJCASx6b9zd4EdV8tBtcCdx-HInmtPgLV6u6E2xWolBD885Z1_o3NWX2pCB_5ZVzzJMWWF5nbmXUzKS7fQCM9y3L9qJnV2uCSrW138h_lVhzv6isFY2qhHsl1GMRmRYQgXCOBx2kSbXtMpSQXaOLY5auJB5ksEouLBjONFbaA1nEdiZGmYJrioC2AsuwBBcpuRjitBAyDBrI-T5U-l8n79oqBdWARy0WVqF2mRa16cdrbhGx5NH7S89lYR7DsCIQKgLdaDp46XWerSd3b1X5gBhcs5hTGqTtRiGUfrwedaRWA1ilggeVrfnSQF3bwdUnLfwTSbF2o4yIXasBkLUfd_un8FFTwgjfURRGQRpZf1x5-eObjRCnDjO4u8cEZSpv_7rn2pLTCFgwG8PY9XnoG4tuSoSKH8VnG4zk5IB6K35YMY1F23rfF45EggLhhOrVOO_NabFnM4fRm6qhfnWAn/4j6/KnA6Al0STKuTHTBTwvPuhQ/h0/h001.TLjXFcT0FY80zZLWeNdrwW5K5OyyrdoIbgwgMnrfuXI"><span class="translation_missing" title="translation missing: en.templates.posts.email.header.read_online">Read Online</span></a></p></td></tr><tr><td class="dd" align="center" valign="top" style="padding:15px 0;"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" align="center"><tr><td align="center" valign="top"><h1 style="text-align:left;font-family:'Open Sans','Segoe UI','Apple SD Gothic Neo','Lucida Grande','Lucida Sans Unicode',sans-serif;font-weight:Bold;font-size:32px;color:#2A2A2A;padding:2px 0;line-height:38px;"> How AI is Learning to Reason: RL Tricks, Policy Optimization, and the New WebWatcher Agent </h1></td></tr></table></td></tr><tr><td style="height:0px;width:0px;"><div style="height:1px;" data-open-tracking="true"> <img src="https://elink4f7.mail.bycloud.ai/ss/o/u001.3wmUuY8gEWd4_869a_eXcg/4j6/KnA6Al0STKuTHTBTwvPuhQ/ho.gif" alt="" width="1" height="1" border="0" style="height:1px !important;width:1px !important;border-width:0 !important;margin-top:0 !important;margin-bottom:0 !important;margin-right:0 !important;margin-left:0 !important;padding-top:0 !important;padding-bottom:0 !important;padding-right:0 !important;padding-left:0 !important;"/> </div></td></tr></table></div></td></tr><tr id="content-blocks"><td class="email-card-body" align="center" valign="top" style="padding-bottom:28px;"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" align="center"><tr><td id="nov-18-th-nov-24-th-33-latest-ai-re" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:normal;padding:0px 28px;text-align:left;"><h6 style="color:#2A2A2A;font-weight:normal;mso-line-height-alt:87.5%;"><i>Aug 11th ~ Aug 17th</i><br><i>#69 Latest AI Research Explained Simply</i></h6></td></tr><tr><td><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" style=""><tr><td bgcolor="#222222" style="background-color:#222222;padding:0.0px 0.0px 0.0px 0.0px;"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0"><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"></p></td></tr></table></td></tr></table></td></tr><tr><td id="industry-news-in-1-line" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:Bold;padding:0px 28px;text-align:left;"><h2 style="color:#2A2A2A;font-weight:Bold;mso-line-height-alt:150.0%;">🗞️ Industry News in 1 Line</h2></td></tr><tr><td style="padding-bottom:12px;padding-left:50px;padding-right:40px;padding-top:12px;" class="ee"><div style="margin-left:0px;" class="edm_outlooklist"><ol start="1" style="list-style-type:decimal;margin:0px 0px;padding:0px 0px 0px 0px;"><li class="listItem ultext"><p style="mso-line-height-alt:150.0%;padding:0px;text-align:left;word-break:break-word;"><span style="background-color:#e0e0e0;"><span style="color:rgb(255, 58, 58);font-size:0.6rem;">♥ 1.2k</span></span> Google has launched "<a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.oB7zuO_W-X4Toa45C28ng3N7PLrWqoSDhsskwKrBQegb4KaJZ9Nlh1EFODcvnqXvpPph4-eFjKnC_SrHKLEXTjaHLCS2LqMMbLZ8tlFp1s9h6p4DVXPoqpHdOtThXySpgXOamHwSYq3I8xssoFYQ9EikngXzp-7SH2a1SnVB8HtOgAN60T9sH9CW1Fr2WzRw/4j6/KnA6Al0STKuTHTBTwvPuhQ/h1/h001.K5WYxVKVb3IhckzDWpkDiHeyFgVEoMT86u3cn_YfBzQ" target="_blank" rel="noopener noreferrer nofollow"><span>Flight Deals</span></a>", which is a new AI-powered search tool within Google Flights that allows users to find airfare using conversational language. The feature is designed for flexible travelers, using AI to parse natural language queries for specific criteria like budget or destination type and then searching live flight data for relevant options. You can <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.DUiN96-Eq7pUHzwEhy5j2_TC1QL2H4V-QqjxYM7S5Q98PbgZAgupuaY28c6ZyK9mJ08fhxoVBWWoyLmCr7HClLdzC-jVAj3_HtQKcC8_Z8b3CqxDsiuDEr5Njeasdqb9kuzNlsZlTKhx2TawzFxzpQv9GJ_MJk8FzGuAYLoM9z8/4j6/KnA6Al0STKuTHTBTwvPuhQ/h2/h001.HdQ1cnTlIcm4k16OjXC6R-Z8zEGUVJDJlMfY2FnYxAA" target="_blank" rel="noopener noreferrer nofollow"><span>try Flight Deals right now</span></a> if you are in the US, Canada, or India. </p><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:500px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/a2902393-a748-4e01-a16b-da4916dcf389/image.png?t=1755615726" alt="" height="auto" width="500" style="display:block;width:100%;" border="0"/></td></tr></table></li><li class="listItem ultext"><p style="mso-line-height-alt:150.0%;padding:0px;text-align:left;word-break:break-word;"><span style="background-color:#e0e0e0;"><span style="color:rgb(255, 58, 58);font-size:0.6rem;">♥ 1.5k</span></span> <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.qZVn6KJQuMivjuNasJr7IBuzG3wZgOq0OyHCIz-AuK11ek0aJA0VDMdGUh1jO6ZFhwfWvajQ9DsDHMr5p_Uwec9dRbDnxWm9-ep_G2duHxcTcqLkRT_Xvj8xzZF14A0f17Kg4f5rO3tBBABbfEe7Hw/4j6/KnA6Al0STKuTHTBTwvPuhQ/h3/h001.cDg907yjyD8TTkwaG7Wq7chXX8FJQTAm8Ws3kJlN_Rg" target="_blank" rel="noopener noreferrer nofollow"><span>StepFun AI</span></a> has released <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.VomAAYwkCjux8i_FMc4kJdkf2CY_tD65Eb5Rni7xe7gi9NzbC6G61CFMeHqmPOg8aiiNtmc1GJGIVf_AqmoIjA9bMv_dXbta7gWQA3QeKjA5NAcRGXXN2HH0hd5ewckd-rvgJxYnK0VUtU-qIkCFsoCfbDLOvSNr4u1Sp2BHDJI/4j6/KnA6Al0STKuTHTBTwvPuhQ/h4/h001.5zx5_XgppbUfUvt9pWeBwTjOYsfz8lbaWC5tc15WyBE" target="_blank" rel="noopener noreferrer nofollow"><span>NextStep-1</span></a>, a new open-source autoregressive model for <b>generating and editing images</b> from text prompts. The model works by processing sequences of text and continuous image tokens together, which allows it to <b>preserve more visual detail compared to traditional methods</b>. You can try it today by <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.VomAAYwkCjux8i_FMc4kJdkf2CY_tD65Eb5Rni7xe7gi9NzbC6G61CFMeHqmPOg8aiiNtmc1GJGIVf_AqmoIjNZaVktptdPtSvi6iw7WGmzSDQJC44Q9LGmMud5qreaWKOPe3AiiybxJEc5pnEXNvzLWdxl3x1Ci8c9nm-QSQjc/4j6/KnA6Al0STKuTHTBTwvPuhQ/h5/h001.T8wWDjGMUrNBx6UDglfPOdo6XE-y3t2BMkk7P4ccYN8" target="_blank" rel="noopener noreferrer nofollow"><span>downloading the GitHub repository</span></a>, which includes pre-trained models for text-to-image generation and image editing. </p><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:500px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/279e6f7b-9b3e-475e-b6f3-e4e9b15c7744/dog.jpg?t=1755616096" alt="" height="auto" width="500" style="display:block;width:100%;" border="0"/></td></tr></table></li><li class="listItem ultext"><p style="mso-line-height-alt:150.0%;padding:0px;text-align:left;word-break:break-word;"><span style="background-color:#e0e0e0;"><span style="color:rgb(255, 58, 58);font-size:0.6rem;">♥ 424k</span></span> The <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.DUiN96-Eq7pUHzwEhy5j22EEDMv9rlnaaAMjsukFIHg8qPtCGUQ5AOdK4NTIBkiTLaweWOclylnuA81BPlPK5AOkL26z4byzxaifDtDTi_as7K94Hw10y60EobwFSS_9fXyXfbSlkWkaisd-IIWwEYqZdK4gYGwWu057_PpMrkpmYRwONH9MEsAjdVFojei1QFE0bLrQSdbPn9CmbSXD_SajXzzAQYVl4lCJF2Q5jpgz1dyG8z2HTUxNp_-WLTU0ejyCF8F8zJ1kZ0PUBh7hgQ/4j6/KnA6Al0STKuTHTBTwvPuhQ/h6/h001.Ov_MnCMlNKzxh_H0-FxDSAmWDq3c5Oqf2b99QcIz5vc" target="_blank" rel="noopener noreferrer nofollow"><span>U.S. General Services Administration (GSA)</span></a> has launched <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.DUiN96-Eq7pUHzwEhy5j20UIQikZxLW0h1NWHQGPR0nruNArChhKAqBvPjef9Ic7WC6pobuBSnE1pP4wY1db1UhLRxLdQxSfRy6qugCJvuy2FFhCKhDZyCbBQLDMR-xrkjSn6Z8xZjVxoRxU7dDARg/4j6/KnA6Al0STKuTHTBTwvPuhQ/h7/h001.scxtrtJILhWUHUkQLsSmVVjsYM7zvZTWCAwsxbaJfjc" target="_blank" rel="noopener noreferrer nofollow"><span>USAi</span></a>, which is a new platform <b>allowing federal agencies to test and adopt generative AI technologies</b> at no cost. This evaluation suite provides government employees with tools for tasks like code generation and document summarization, and enables them to assess different systems before procurement. </p></li><li class="listItem ultext"><p style="mso-line-height-alt:150.0%;padding:0px;text-align:left;word-break:break-word;"><span style="background-color:#e0e0e0;"><span style="color:rgb(255, 58, 58);font-size:0.6rem;">♥ 1.3k</span></span> Former Twitter CEO Parag Agrawal has launched <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.zNfxTwpJFmrsCuJJphGRkIXyONNZpk4wA_cjAwPA8GKLKsvmwxKD90o1l86YgZxZGyW6fVTwT7el_oppQtHhAu_x5VYgW5tmuwf-6K-MBJ8JaIccNu_50mg7dH60X5DW/4j6/KnA6Al0STKuTHTBTwvPuhQ/h8/h001.Kj69rGVCpjDjSSMsQQEuh_5TIFfNN3OOsU-QKBOPnUg" target="_blank" rel="noopener noreferrer nofollow"><span>Parallel Web Systems</span></a>, and it has raised approximately $30 million in funding. The company is developing an AI system that interacts with the public web in real-time to fetch, verify, and organize information. Parallel claims its technology has beaten not only unreleased models like GPT-5 in <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.zNfxTwpJFmrsCuJJphGRkKgiX5HY_fPlPKpx0cDesxf7zLp2LvepkXqWfwAOmu4AEz9TWouPC6tp3VoTg_fHvx_iLSF3LCO6ymHxtWVV4NKimDyQzvpGeS2CF1S_O1iJvcxzoavj9QgmCGIC1f_nvgnnMTeD2eCx9hxOa7fcwkQ/4j6/KnA6Al0STKuTHTBTwvPuhQ/h9/h001.3_IXokPBYzZTAJd8S5ZFw2HAf-rI1pClFErDrkMZMTE" target="_blank" rel="noopener noreferrer nofollow"><span>deep web research</span></a> but also specialized tools like <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.S3-S-66rObX2TUuSZjz2bqBH5q0xL4zAMXdoPstnvR0fEpqUbez-vLODErblOZ6fpryA6_atq413NLDRHdwFWetfhAzd0YYmN-mdl9cE8j_6BGmzK7Li7oyBs2sh7L2U/4j6/KnA6Al0STKuTHTBTwvPuhQ/h10/h001.zqwHG-nwLyfbOJVRosOjd5wUrM13KRoqNEoODKAv_gY" target="_blank" rel="noopener noreferrer nofollow"><span>EXA</span></a> and human researchers. You can try it today in the <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.zNfxTwpJFmrsCuJJphGRkPl2jYxj0GJTXe42es1-ESRAsIeRgxD23OEd0mj1fk_nV3OkT2hQgDB5d6-V1VgHwZoCXLqPUPtYtB-0ond2nhTBAZx_2tNVmxT7NSZmBHT4tazFiT9FsrqDsvNC2PbJgAsXk-N7fRTtOCwapiC7h0w/4j6/KnA6Al0STKuTHTBTwvPuhQ/h11/h001.JyUDEBcAs3m-PjjPY4FojExgdj5K-5dPuSxWCEZUtr0" target="_blank" rel="noopener noreferrer nofollow"><span>Parallel playground</span></a>. </p><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:563px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/8f0ee1e6-e9a2-44a4-9ae3-8782367ad80d/image.png?t=1755617755" alt="" height="auto" width="563" style="display:block;width:100%;" border="0"/></td></tr></table></li><li class="listItem ultext"><p style="mso-line-height-alt:150.0%;padding:0px;text-align:left;word-break:break-word;"><span style="background-color:#e0e0e0;"><span style="color:rgb(255, 58, 58);font-size:0.6rem;">♥ 2.4k</span></span> <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.9ggl6Mt0xphuuMReR5gVpTanKmQxfkMnn7OJKz54OPGTzL_2bAqqQ4gC3UMpnqpN5Lx1bqzLAxYOsr3jN4tTzFmDxm8FLTAzSbYJ3yOuEV411K4F2SDXf0A9qkNpw81q4X_DrgyygyA98_zf3VBhsNz88bEkU70zr4meZ7VjOLCLGutOVh55sqCK9IjVrLkuZ2JwhIQDnVraCAFJ6YrkAnTIHj8vTMnvmSGauUK7zVc/4j6/KnA6Al0STKuTHTBTwvPuhQ/h12/h001.hmEe76s-6KLlctkcC2uJhw0ax6ukKOIDpl5y-2wNGPg" target="_blank" rel="noopener noreferrer nofollow"><span>Google has released Imagen 4</span></a>, its most advanced text-to-image model. This model significantly improves image quality and is much better at rendering text within images than previous versions. All images created with Imagen 4 will include a non-visible SynthID digital watermark to make them easier to detect. <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.VomAAYwkCjux8i_FMc4kJYL8w9h2svrzoHhyP7NCwaEuuBz1d6esM6UAZxByfrnGgXjjXYBoCvJTaDYxHmV7YHSZbxVrY-XJxWGXcRsOlIEYsqLyvmoBExgICx28bpytYJmYxBIKXgfmEzI4HzcfC23vHz_Zk9aYLThpI5b9ApYtkP1LvFmQ-hUc0B1uCs41cxnhVU6OD2NvjI6rnHyRuow42PDwWJKWzuNXx5oW7uE/4j6/KnA6Al0STKuTHTBTwvPuhQ/h13/h001.zQPwh_SAn7TLRPMDFrkxHkdyKlv2D7VooTMvVRhtZKE" target="_blank" rel="noopener noreferrer nofollow"><span>Download this Jupyter notebook</span></a> to start using Imagen today. </p><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:626px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/e854ccbb-5c1d-4e66-a0f5-21b7fb8ea62d/3-panel-cosmic-epic-comic-imagen-4.original.png?t=1755615653" alt="" height="auto" width="626" style="display:block;width:100%;" border="0"/></td></tr></table></li></ol></div></td></tr><tr><td><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" style=""><tr><td bgcolor="#222222" style="background-color:#222222;padding:0.0px 0.0px 0.0px 0.0px;"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0"><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"></p></td></tr></table></td></tr></table></td></tr><tr><td><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" style=""><tr><td bgcolor="transparent" style="background-color:transparent;border-color:#2C81E5;border-style:solid;border-width:5px;padding:0.0px 0.0px 0.0px 0.0px;"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0"><tr><td class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:Bold;padding:0px 28px;text-align:left;"><h2 style="color:#2A2A2A;font-weight:Bold;mso-line-height-alt:150.0%;"><span style="">Support My Newsletter</span></h2></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"><span style="color:rgb(34, 34, 34);font-family:Georgia, "Times New Roman", serif;font-size:16px;">As I aim to keep this newsletter free forever, your support means a lot. If you like reading The AI Timeline, consider forwarding it to another research enthusiast. It helps us keep this up for free!</span></p></td></tr><tr><td align="center" valign="top"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" align="center"><tr><td align="center" valign="top" style="font-size:0px;line-height:0px;padding:30px 0px 30px;" class="dd"><table class="j" role="none" width="50%" border="0" cellspacing="0" cellpadding="0" align="center"><tr><td> </td></tr></table></td></tr><tr><td class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:Bold;padding:0px 28px;text-align:left;"><h2 style="color:#2A2A2A;font-weight:Bold;mso-line-height-alt:150.0%;">Share The AI Timeline</h2></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> You currently have <strong>0</strong> referrals. </p></td></tr><tr><td align="left" valign="top" style="padding-bottom:20px;padding-left:15px;padding-right:15px;padding-top:20px; display:none;width:0px;max-height:0px;overflow:hidden;mso-hide:all;height:0;font-size:0;max-height:0;line-height:0;margin:0 auto;" class="dd"><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 0;"><tr><td align="center" valign="top" style="width:313px;"><a href="https://elink4f7.mail.bycloud.ai/ss/c/u001.c6q0w4g5sodbtO4I1B_pxWc4htTObwdorovK0nFHVH-4pUdVE0ELYH5DsNemk732SjNwhPNJ25r0O8B5vYifsGNUqyW5TiZkyMsF1yreu0byy2KW36J1wDdpoLuXg2TU1F1OW8OHoHaU4-ZmrZpPU4RN-crQCEimD190CSn9fPuxpIRojBJyu1VfV5KtQD3QMVdSg2JrjEj5-xm4r4E12Whf08itqPCb9Q5W0X4rt3ubYkqCmWnLeZpmb3_RZcbIk0UE5wZnFLCQJHLFs0qZ0OGpXp89o1HU4mWIBur5Or4tQGm5M_Y8m5PvTEfYfxLRyrcRv7GyVs5oLtFfiySZ2SqtZypLA-h50h61p0uPiA7iA_PiMqlVLtM-87XL33VZi05_O3UTpWE_0nAzFRJ4TW1ayz3_vn4Zlp9IERdbnnAd_1kPLD4lAQcR5PRXgtpC4yW_5QSFFTAWhRkdM-V3Blkz6yY7HFoaTDwUm8hreg-hYchtdHavz7-kDoyLAgR-v-rQ2GqhHLd3KACCSh4WM5jTwZo1ZQLNKIToK8U9fGUBbsjKHUMtDgrboZCmGzNbXp459QgfJ6D4FAgbEDopbAu8WA7P9LP2Z9q4IvlNR6JvTXHni314GAovyUZh90sj/4j6/KnA6Al0STKuTHTBTwvPuhQ/h14/h001.p2mVD9jGTGXSTCLFlxavF3x1A2ej0Tdj2Jp4bhXXWc4" rel="noopener noreferrer nofollow" style="text-decoration:none;" target="_blank"><img src="" alt="" height="auto" width="313" style="display:block;width:100%;" border="0"/></a></td></tr></table></td></tr><tr class="btn_row"><td valign="top" style="padding-bottom:14px;padding-left:28px;padding-right:28px;padding-top:14px;text-align:left;width:100%;word-break:break-word;" class="dd"><table width="100%" role="none" border="0" cellspacing="0" cellpadding="0" style="margin:14px auto 14px auto;"><tr><td align="left" valign="middle"><table role="none" border="0" cellspacing="0" cellpadding="0"><tr><td style="background-color:#2C81E5;border-radius:8px 8px;mso-padding-alt:14px 20px;" class="btn"><a href="https://elink4f7.mail.bycloud.ai/ss/c/u001.c6q0w4g5sodbtO4I1B_pxWc4htTObwdorovK0nFHVH-4pUdVE0ELYH5DsNemk732SjNwhPNJ25r0O8B5vYifsGNUqyW5TiZkyMsF1yreu0byy2KW36J1wDdpoLuXg2TU1F1OW8OHoHaU4-ZmrZpPU4RN-crQCEimD190CSn9fPuxpIRojBJyu1VfV5KtQD3QMVdSg2JrjEj5-xm4r4E12Whf08itqPCb9Q5W0X4rt3ubYkqCmWnLeZpmb3_RZcbIk0UE5wZnFLCQJHLFs0qZ0OGpXp89o1HU4mWIBur5Or4tQGm5M_Y8m5PvTEfYfxLRyrcRv7GyVs5oLtFfiySZ2SqtZypLA-h50h61p0uPiA7iA_PiMqlVLtM-87XL33VZi05_O3UTpWE_0nAzFRJ4TW1ayz3_vn4Zlp9IERdbnnAd_1kPLD4lAQcR5PRXgtpC4yW_5QSFFTAWhRkdM-V3Blkz6yY7HFoaTDwUm8hreg-hYchtdHavz7-kDoyLAgR-v-rQ2GqhHLd3KACCSh4WM5jTwZo1ZQLNKIToK8U9fGUBbsjKHUMtDgrboZCmGzNbXp459QgfJ6D4FAgbEDopbAu8WA7P9LP2Z9q4IvlNR6JvTXHni314GAovyUZh90sj/4j6/KnA6Al0STKuTHTBTwvPuhQ/h15/h001.5-kldrzE5T769SoF0sIONtPSG6tnvFYHjscikQChBbg" target="_blank" rel="noopener noreferrer nofollow" style="background-color:#2C81E5;border-radius:8px 8px;color:#FFFFFF;display:inline-block;font-family:'Open Sans','Segoe UI','Apple SD Gothic Neo','Lucida Grande','Lucida Sans Unicode',sans-serif;font-size:16px;font-weight:normal;line-height:18px;padding:14px 20px;text-decoration:none;"> Click to Share </a></td></tr></table></td></tr></table></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> Or copy and paste this link to others: <a class="link" href="https://mail.bycloud.ai/subscribe?ref=6SqUHb8KiF&_bhlid=bf7a73b936aab597b0df9777ef50b28c5a049d32" target="_blank" rel="noopener noreferrer nofollow" clicktracking="off"><span>https://mail.bycloud.ai/subscribe?ref=6SqUHb8KiF</span></a></p></td></tr><tr><td align="center" valign="top" style="font-size:0px;line-height:0px;padding:30px 0px 30px;" class="dd"><table class="j" role="none" width="50%" border="0" cellspacing="0" cellpadding="0" align="center"><tr><td> </td></tr></table></td></tr></table></td></tr><tr class="btn_row"><td valign="top" style="padding-bottom:14px;padding-left:28px;padding-right:28px;padding-top:14px;text-align:center;width:100%;word-break:break-word;" class="dd"><table width="100%" role="none" border="0" cellspacing="0" cellpadding="0" style="margin:14px auto 14px auto;"><tr><td align="center" valign="middle"><table role="none" border="0" cellspacing="0" cellpadding="0"><tr><td style="background-color:#2C81E5;border-radius:8px 8px;mso-padding-alt:14px 20px;" class="btn"><a href="https://elink4f7.mail.bycloud.ai/ss/c/u001.zNfxTwpJFmrsCuJJphGRkKSrCVph9-fOYkcjx4VfJRwtQQsKrZC8pi-PiKai2fq4lAto9WepTJo69aQJ1T73b1BYaJHeCrLz1cWpFYfpKjdJ071BkzwRo9IrCS5YAIxy/4j6/KnA6Al0STKuTHTBTwvPuhQ/h16/h001.NTtJmyfta4zW48IICaGnwMAFEZpZ0rfzJM6hZAhQxfA" target="_blank" rel="noopener noreferrer nofollow" style="background-color:#2C81E5;border-radius:8px 8px;color:#FFFFFF;display:inline-block;font-family:'Open Sans','Segoe UI','Apple SD Gothic Neo','Lucida Grande','Lucida Sans Unicode',sans-serif;font-size:16px;font-weight:normal;line-height:18px;padding:14px 20px;text-decoration:none;"> Check Out My Patreon </a></td></tr></table></td></tr></table></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"><span style=""><a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.tLfGW26lAwaS9gFg17HSoGymQ3NNPtd5dE5MV_8UgjIDFPVXngz8pvQBldSW42yhUe_Qiq6DgEPMEBuPL9yfRpXelTiuu2kS8pLFvsoem_XoZoy_n13sTKUhZIbl0VH6/4j6/KnA6Al0STKuTHTBTwvPuhQ/h17/h001.gKmwwUrjbdXLlFIpED-lTazXhQ9vGKvk0u_VaMyjwN4" target="_blank" rel="noopener noreferrer nofollow"><span>Advertise with The AI Timeline! </span></a></span></p></td></tr></table></td></tr></table></td></tr><tr><td><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" style=""><tr><td bgcolor="#222222" style="background-color:#222222;padding:0.0px 0.0px 0.0px 0.0px;"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0"><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"></p></td></tr></table></td></tr></table></td></tr><tr><td id="part-i-tricks-or-traps-a-deep-dive-" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:Bold;padding:0px 28px;text-align:left;"><h2 style="color:#2A2A2A;font-weight:Bold;mso-line-height-alt:150.0%;">Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning</h2></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"><i>Liu</i><span style=""><i> et al. [</i></span><i>Alibaba Group, Beijing Jiaotong University</i>,<i> Hong Kong University of Science and Technology, Nanjing University</i>, <i>Peking University, OpenRLHF, CleanRL</i><span style=""><i>]</i></span></p></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"><span style="background-color:#e0e0e0;"><span style="color:rgb(255, 58, 58);font-size:0.6rem;"> ♥ 485 </span></span><span style="color:rgb(44, 129, 229);font-size:0.6rem;"> </span><span style="background-color:#e0e0e0;"><span style="color:rgb(44, 129, 229);font-size:0.6rem;"> LLM Reasoning </span></span></p></td></tr><tr><td id="reinforcement-learning-for-llm-reas" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:normal;padding:0px 28px;text-align:left;"><h3 style="color:#2A2A2A;font-weight:normal;mso-line-height-alt:125.0%;">Reinforcement Learning for LLM Reasoning</h3></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> Reinforcement learning has become a key tool for unlocking advanced reasoning in large language models, and this is the driving factor of progress in areas like mathematical problem-solving and code generation. However, as research in “RL for LLM” (RL4LLM) is gaining popularity, AI researchers are faced with confusion. </p></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> Different papers recommended conflicting techniques, like group-level vs. batch-level reward normalization, without clear guidelines. Additionally, experimental inconsistencies, from training data to model initialization, muddied the waters, making it hard to choose effective methods. This paper tackles the chaos head-on by reproducing and evaluating popular RL techniques in a unified framework. </p></td></tr><tr><td align="center" valign="top" style="padding-bottom:20px;padding-left:15px;padding-right:15px;padding-top:20px; " class="dd"><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:626px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/671b2537-66c3-403c-9180-3a652a642997/image.png?t=1755610163" alt="" height="auto" width="626" style="display:block;width:100%;" border="0"/></td></tr><tr><td align="center" valign="top" class="t" style="width:626px; padding: 4px 0px 4px 0px;"><p>A minimalist two-technique combination that enhances learning capacity in critic-free policies with vanilla PPO loss.</p></td></tr></table></td></tr><tr><td id="inner-workings-of-rl-techniques-for" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:normal;padding:0px 28px;text-align:left;"><h3 style="color:#2A2A2A;font-weight:normal;mso-line-height-alt:125.0%;">Inner Workings of RL Techniques for LLM Reasoning</h3></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> The study tests four <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.fUNb4GdFo9D3F8WuLArtoWWoNdNWpS2mJyXBc_AQbeq9fHB480u1jEwdj1pcD0iWucniO0GzTMLP46bziBbwcKK2ijydhxiZkfhc0bUavluHw9Qth2g91jGUAaeYUuse_OgR56GC8Y_SjUCG_EcQE6g3KZNB3aqJt0WUyGGW770/4j6/KnA6Al0STKuTHTBTwvPuhQ/h18/h001.dokpYnQDKkVJ3yr1fFfHNKd99aoBgexY1OnYr5LxewA" target="_blank" rel="noopener noreferrer nofollow"><span>core techniques shaping RL4LLM</span></a>. First, <b>advantage normalization</b> stabilizes training by adjusting rewards. Group-level normalization (averaging rewards within responses to a single prompt) proves reliable across settings, while batch-level normalization (averaging across all responses) excels with large-scale rewards. </p></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> However, when rewards cluster tightly, like on easy tasks, removing standard deviation from calculations prevents skewed updates. Combining group-level mean with batch-level standard deviation creates a robust hybrid approach. </p></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> Next, the <b>Clip-Higher</b> approach tweaks PPO’s clipping mechanism and expands the upper bound for policy updates. This encourages exploration in aligned models (already fine-tuned for reasoning) by preserving token diversity. For smaller models, performance scales with the clipping bound; larger models peak at specific values. Token-level linguistics reveal that higher clipping frees connectors like “therefore” from suppression, enabling more innovative reasoning paths. </p></td></tr><tr><td align="center" valign="top" style="padding-bottom:20px;padding-left:15px;padding-right:15px;padding-top:20px; " class="dd"><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:626px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/d89d83c3-d1c3-4199-9643-c032e01a4cd6/image.png?t=1755610328" alt="" height="auto" width="626" style="display:block;width:100%;" border="0"/></td></tr><tr><td align="center" valign="top" class="t" style="width:626px; padding: 4px 0px 4px 0px;"><p> Test accuracy and response length of four model variants</p></td></tr></table></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"><b>Loss aggregation</b> balances how tokens influence training. Token-level aggregation (weighting each token equally) helps base models learn from lengthy reasoning chains. However, for aligned models, sequence-level aggregation (averaging per-response loss) works better, likely because these models already handle structure well. </p></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> Finally, <b>overlong filtering masks reward</b> responses exceeding length limits. This boosts accuracy for short-to-medium tasks by avoiding penalizing truncated reasoning but adds little for complex, long-tail problems. </p></td></tr><tr><td id="evaluation-and-results-of-rl-techni" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:normal;padding:0px 28px;text-align:left;"><h3 style="color:#2A2A2A;font-weight:normal;mso-line-height-alt:125.0%;">Evaluation and Results of RL Techniques for LLM Reasoning</h3></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> The experiments tested models with 4B to 8B parameters and datasets of varying difficulty. A minimalist combination (Lite PPO) that uses group-mean and batch-std normalization with token-level loss consistently outperformed complex methods like GRPO and DAPO. </p></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> It improved base model accuracy by up to 12% on mathematical benchmarks while simplifying implementation. Clip-Higher lifted aligned model performance by 2–4%, and overlong filtering boosted short-task accuracy by 3–5%. </p></td></tr><tr><td align="center" valign="top" style="padding-bottom:20px;padding-left:15px;padding-right:15px;padding-top:20px; " class="dd"><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:626px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/d27b4a43-2022-4ca9-b2fa-c098ade0ac1b/image.png?t=1755610352" alt="" height="auto" width="626" style="display:block;width:100%;" border="0"/></td></tr></table></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> The key takeaway is that we should use group-level normalization for reliability, Clip-Higher for aligned models, and token-level loss for base models. And if you are just getting started, you should start with the Lite PPO approach. </p></td></tr><tr class="btn_row"><td valign="top" style="padding-bottom:14px;padding-left:28px;padding-right:28px;padding-top:14px;text-align:center;width:100%;word-break:break-word;" class="dd"><table width="100%" role="none" border="0" cellspacing="0" cellpadding="0" style="margin:14px auto 14px auto;"><tr><td align="center" valign="middle"><table role="none" border="0" cellspacing="0" cellpadding="0"><tr><td style="background-color:#2C81E5;border-radius:8px 8px;mso-padding-alt:14px 20px;" class="btn"><a href="https://elink4f7.mail.bycloud.ai/ss/c/u001.fUNb4GdFo9D3F8WuLArtoV5sElgytBlvJRzI9WtI92ZmeVol1JG2_wBc7wb9opw1bsq934AlDAdfh5CKw-SIC6w1e5guFaNJNTq9rdn6-64wcF-wyaYV7cfdl_A5bl9r/4j6/KnA6Al0STKuTHTBTwvPuhQ/h19/h001.8xILTgVunLIECmo-NJrcSSst4lj4zJonWnTiVcIpQiY" target="_blank" rel="noopener noreferrer nofollow" style="background-color:#2C81E5;border-radius:8px 8px;color:#FFFFFF;display:inline-block;font-family:'Open Sans','Segoe UI','Apple SD Gothic Neo','Lucida Grande','Lucida Sans Unicode',sans-serif;font-size:16px;font-weight:normal;line-height:18px;padding:14px 20px;text-decoration:none;"> Read Full Paper </a></td></tr></table></td></tr></table></td></tr><tr><td><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" style=""><tr><td bgcolor="#222222" style="background-color:#222222;padding:0.0px 0.0px 0.0px 0.0px;"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0"><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"></p></td></tr></table></td></tr></table></td></tr><tr><td id="sample-more-to-think-less-group-fil" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:Bold;padding:0px 28px;text-align:left;"><h2 style="color:#2A2A2A;font-weight:Bold;mso-line-height-alt:150.0%;">Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning</h2></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"><i>Shrivastava et al. [Microsoft Research</i>, <i>University of Wisconsin-Madison]</i></p></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"><span style="background-color:#e0e0e0;"><span style="color:rgb(255, 58, 58);font-size:0.6rem;"> ♥ 22k </span></span><span style="color:rgb(44, 129, 229);font-size:0.6rem;"> </span><span style="background-color:#e0e0e0;"><span style="color:rgb(44, 129, 229);font-size:0.6rem;"> LLM Reasoning </span></span><span style="color:rgb(44, 129, 229);font-size:0.6rem;"> </span></p></td></tr><tr><td id="introduction-to-efficient-reasoning" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:normal;padding:0px 28px;text-align:left;"><h3 style="color:#2A2A2A;font-weight:normal;mso-line-height-alt:125.0%;">Introduction to Efficient Reasoning with GFPO</h3></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> LLMs trained with reinforcement learning often produce longer responses to gain accuracy, leading to “filler” text that doesn’t add value. This length inflation is inefficient, especially since longer answers aren<span style="color:rgb(44, 129, 229);font-size:0.6rem;">’</span>t always more accurate. The paper introduces Group Filtered Policy Optimization (GFPO) to solve this. </p></td></tr><tr><td align="center" valign="top" style="padding-bottom:20px;padding-left:15px;padding-right:15px;padding-top:20px; " class="dd"><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:626px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/4d39f77d-478d-45ce-ae97-4e943ba8abad/image.png?t=1755610834" alt="" height="auto" width="626" style="display:block;width:100%;" border="0"/></td></tr></table></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> GFPO samples larger groups of responses during training and filters them based on key metrics, like response length or token efficiency (reward per token). By learning only from the best responses, GFPO teaches models to generate concise answers. </p></td></tr><tr><td align="center" valign="top" style="padding-bottom:20px;padding-left:15px;padding-right:15px;padding-top:20px; " class="dd"><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:626px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/a2b82ab9-adb0-49af-b186-b0385cdce048/image.png?t=1755610869" alt="" height="auto" width="626" style="display:block;width:100%;" border="0"/></td></tr></table></td></tr><tr><td id="how-gfpo-works" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:normal;padding:0px 28px;text-align:left;"><h3 style="color:#2A2A2A;font-weight:normal;mso-line-height-alt:125.0%;"> How GFPO Works</h3></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> GFPO builds on GRPO (Group Relative Policy Optimization), which samples multiple responses per question and uses their average reward as a baseline. GFPO improves this by sampling a larger group (e.g., 16 or 24 responses instead of 8). It then filters these responses, retaining only the top-k based on a chosen metric, such as shortest length or highest token efficiency. </p></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> The advantages (used to update the model) are computed solely for these selected responses, while others are ignored. This filtering acts as implicit reward shaping, steering the model toward desired behaviors without complex reward engineering. </p></td></tr><tr><td align="center" valign="top" style="padding-bottom:20px;padding-left:15px;padding-right:15px;padding-top:20px; " class="dd"><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:626px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/fd696310-3cdb-4726-94f6-6cb418ceb40a/image.png?t=1755611403" alt="" height="auto" width="626" style="display:block;width:100%;" border="0"/></td></tr></table></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> A variant called Token Efficiency GFPO ranks responses by reward divided by length, promoting outputs that justify their length with high rewards. This cuts filler tokens more aggressively than length-based filtering alone. Another variant, Adaptive Difficulty GFPO, adjusts the retained group size dynamically based on question difficulty, keeping more responses for harder problems to preserve accuracy. </p></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> For instance, it retains eight responses for very hard questions but only 4 for easy ones. By sampling more during training, GFPO reduces the need for lengthy reasoning chains during actual use. </p></td></tr><tr><td id="results-and-impact-of-gfpo" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:normal;padding:0px 28px;text-align:left;"><h3 style="color:#2A2A2A;font-weight:normal;mso-line-height-alt:125.0%;">Results and Impact of GFPO</h3></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> GFPO significantly reduces response lengths across benchmarks like AIME, GPQA, and LiveCodeBench. Token Efficiency GFPO achieves the strongest cuts, 71–85% less length inflation than GRPO, while matching accuracy. Adaptive Difficulty GFPO excels on hard problems, reducing length by 60% without accuracy loss. Out-of-distribution tests (e.g., coding tasks) show GFPO not only trims excess length but sometimes improves accuracy. </p></td></tr><tr><td align="center" valign="top" style="padding-bottom:20px;padding-left:15px;padding-right:15px;padding-top:20px; " class="dd"><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:626px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/ab282884-f36c-4e45-8b7b-6eca85a3ba96/image.png?t=1755611463" alt="" height="auto" width="626" style="display:block;width:100%;" border="0"/></td></tr><tr><td align="center" valign="top" class="t" style="width:626px; padding: 4px 0px 4px 0px;"><p>Pass@1 Accuracy, Response Lengths, and Length Inflation Reduction on AIME 25, AIME 24, and GPQA.</p></td></tr></table></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"></p></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> Pareto analysis confirms GFPO often delivers shorter responses with equal or better performance than GRPO. Additionally, GFPO shifts response distributions away from verbosity, reducing ≥20k-token outputs from 32% to 22%. This efficiency demonstrates how targeted training can produce leaner, faster models without sacrificing reasoning quality. </p></td></tr><tr class="btn_row"><td valign="top" style="padding-bottom:14px;padding-left:28px;padding-right:28px;padding-top:14px;text-align:center;width:100%;word-break:break-word;" class="dd"><table width="100%" role="none" border="0" cellspacing="0" cellpadding="0" style="margin:14px auto 14px auto;"><tr><td align="center" valign="middle"><table role="none" border="0" cellspacing="0" cellpadding="0"><tr><td style="background-color:#2C81E5;border-radius:8px 8px;mso-padding-alt:14px 20px;" class="btn"><a href="https://elink4f7.mail.bycloud.ai/ss/c/u001.fUNb4GdFo9D3F8WuLArtoV5sElgytBlvJRzI9WtI92bKobkxyj2J8aegYWLGayHz9Wc20xI-UkRKF4wBLXADYuWeot-8eNy7ZQpoSJPgb34RnZBY71TqXTdUQ0Dy08aC/4j6/KnA6Al0STKuTHTBTwvPuhQ/h20/h001.slon0ipcPTZ4ZSEEbLoXE3JsVh1O9-oyw2QHd0vs_q4" target="_blank" rel="noopener noreferrer nofollow" style="background-color:#2C81E5;border-radius:8px 8px;color:#FFFFFF;display:inline-block;font-family:'Open Sans','Segoe UI','Apple SD Gothic Neo','Lucida Grande','Lucida Sans Unicode',sans-serif;font-size:16px;font-weight:normal;line-height:18px;padding:14px 20px;text-decoration:none;"> Read Full Paper </a></td></tr></table></td></tr></table></td></tr><tr><td><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" style=""><tr><td bgcolor="#222222" style="background-color:#222222;padding:0.0px 0.0px 0.0px 0.0px;"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0"><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"></p></td></tr></table></td></tr></table></td></tr><tr><td id="web-watcher-breaking-new-frontier-o" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:Bold;padding:0px 28px;text-align:left;"><h2 style="color:#2A2A2A;font-weight:Bold;mso-line-height-alt:150.0%;">WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent</h2></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"><i>Geng</i><span style=""><i> et al. [</i></span><i>Tongyi Lab, Alibaba Group</i><span style=""><i>]</i></span></p></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"><span style="background-color:#e0e0e0;"><span style="color:rgb(255, 58, 58);font-size:0.6rem;"> ♥ 424 </span></span><span style="color:rgb(44, 129, 229);font-size:0.6rem;"> </span><span style="background-color:#e0e0e0;"><span style="color:rgb(44, 129, 229);font-size:0.6rem;"> Deep Research bycloud’s pick </span></span><span style="color:rgb(44, 129, 229);font-size:0.6rem;"> </span></p></td></tr><tr><td id="introduction-to-multimodal-deep-res" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:normal;padding:0px 28px;text-align:left;"><h3 style="color:#2A2A2A;font-weight:normal;mso-line-height-alt:125.0%;">Introduction to Multimodal Deep Research Agents</h3></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> AI agents can produce a lot of text, but what if they could also research complex topics like a human expert, searching the web, analyzing documents, and synthesizing answers? While text-based agents excel at these tasks, they stumble when faced with real-world challenges requiring visual understanding, like interpreting scientific diagrams or navigating image-rich websites. This gap limits their usefulness for everyday multimodal problems. </p></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> To solve this problem, this paper developed <b>WebWatcher</b>, which is a new multimodal agent that combines visual and textual reasoning with sophisticated tool use. Unlike existing approaches that rely on rigid templates or single-modality tools, WebWatcher integrates web search, image analysis, and code execution to handle high-difficulty tasks where perception alone fails. </p></td></tr><tr class="embed-gen-text"><td align="center" valign="top" style="padding:12px 27px 12px 27px;" class="dd"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" align="center"><tr><td align="center" valign="top" class="o" style="padding:12px 12px 12px 12px;;background-color:#FFFFFF;border-color:#F1F1F1;border-radius:5px 5px 5px 5px;border-width:1px 1px 1px 1px;"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" align="center"><tr><td align="left" valign="top" class="l"><p><a href="https://elink4f7.mail.bycloud.ai/ss/c/u001.VomAAYwkCjux8i_FMc4kJRwWvdOxVrQohNVvREY7er7Jwy5wqWCNWqTvw58yu9nNB_q55tVO9ktIJKFDN-Mmpx437VCZ3OkrWw1oruVrYVB3ESK9cEaHQtqE0REq3kGs/4j6/KnA6Al0STKuTHTBTwvPuhQ/h21/h001.gylZsAgnhdpvrmyurppOThtG7otq58R0H7ywhQORU2E" style="text-decoration:none;font-style:normal;color:#2D2D2D !important;font-size:14px;line-height:20px;" target="_blank"> WebAgent for Information Seeking (by Tongyi Lab) <tr><td align="left" valign="top" class="m"><p style="font-size:13px;line-height:19px;color:#2D2D2D;"> WebWalker & WebDancer & WebSailor & WebShaper & WebWatcher </p></td></tr><tr><td align="left" valign="bottom" class="n" style="vertical-align:bottom;padding-top:12px;"><p style="word-break:break-word;">github.com/Alibaba-NLP/WebAgent</p></td></tr></a></p></td></tr></table></td></tr></table></td></tr><tr><td id="inner-workings-of-web-watcher" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:normal;padding:0px 28px;text-align:left;"><h3 style="color:#2A2A2A;font-weight:normal;mso-line-height-alt:125.0%;">Inner Workings of WebWatcher</h3></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> WebWatcher uses five tools: </p></td></tr><tr><td style="padding-bottom:12px;padding-left:50px;padding-right:40px;padding-top:12px;" class="ee"><div style="margin-left:0px;" class="edm_outlooklist"><ol start="1" style="list-style-type:decimal;margin:0px 0px;padding:0px 0px 0px 0px;"><li class="listItem ultext"><p style="mso-line-height-alt:150.0%;padding:0px;text-align:left;word-break:break-word;"><b>Web Image Search</b> retrieves relevant visuals </p></li><li class="listItem ultext"><p style="mso-line-height-alt:150.0%;padding:0px;text-align:left;word-break:break-word;"><b>Web Text Search</b> gathers textual information </p></li><li class="listItem ultext"><p style="mso-line-height-alt:150.0%;padding:0px;text-align:left;word-break:break-word;"><b>Webpage Visit</b> navigates and summarizes sites </p></li><li class="listItem ultext"><p style="mso-line-height-alt:150.0%;padding:0px;text-align:left;word-break:break-word;"><b>Code Interpreter</b> handles calculations </p></li><li class="listItem ultext"><p style="mso-line-height-alt:150.0%;padding:0px;text-align:left;word-break:break-word;"> The internal <b>OCR tool</b> extracts text from images. </p></li></ol></div></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> This toolkit enables multi-step reasoning, for example, identifying an obscure animal in a photo, searching Wikipedia for related details, and then cross-referencing revisions in its edit history. </p></td></tr><tr><td align="center" valign="top" style="padding-bottom:20px;padding-left:15px;padding-right:15px;padding-top:20px; " class="dd"><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:626px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/d99a7cdb-3511-43b0-94cc-462811761ea2/image.png?t=1755612063" alt="" height="auto" width="626" style="display:block;width:100%;" border="0"/></td></tr><tr><td align="center" valign="top" class="t" style="width:626px; padding: 4px 0px 4px 0px;"><p>Comparison of VL reasoning agents.</p></td></tr></table></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> Training WebWatcher required high-quality multimodal data. Researchers created the <b>BrowseComp-VL benchmark</b>, featuring questions demanding both visual perception and deep research. They first generated complex text-based questions through web crawling, then transformed them into visual queries. </p></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> For example, a question about “a railway station in northern India” might pair with relevant images, forcing the agent to combine visual cues with external knowledge. </p></td></tr><tr><td align="center" valign="top" style="padding-bottom:20px;padding-left:15px;padding-right:15px;padding-top:20px; " class="dd"><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:626px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/55a20eae-36fe-4bff-8b09-6bf9ed6b4e6e/image.png?t=1755612125" alt="" height="auto" width="626" style="display:block;width:100%;" border="0"/></td></tr><tr><td align="center" valign="top" class="t" style="width:626px; padding: 4px 0px 4px 0px;"><p>Domain Distribution for Level 1 and Level 2.</p></td></tr></table></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> To teach WebWatcher effective tool use, researchers generated synthetic reasoning trajectories. Using GPT-4o, they simulated step-by-step task-solving sequences (e.g., <code><think>Identify the bird</think><tool_call>Image Search</tool_call></code>). These trajectories were filtered for correctness, logical consistency, and multi-step depth. The agent then underwent <b>supervised fine-tuning</b> to predict tool actions, followed by <b>reinforcement learning</b> (GRPO algorithm) to refine decision-making. This two-stage training optimized both tool selection and answer accuracy. </p></td></tr><tr><td id="evaluation-and-impact-on-multimodal" class="dd" align="left" valign="top" style="color:#2A2A2A;font-weight:normal;padding:0px 28px;text-align:left;"><h3 style="color:#2A2A2A;font-weight:normal;mso-line-height-alt:125.0%;">Evaluation and Impact on Multimodal AI</h3></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> WebWatcher outperformed top proprietary and open-source models across four challenging benchmarks. On <b>Humanity’s Last Exam (HLE)</b>, it achieved 13.6% accuracy, surpassing GPT-4o (9.8%) and Gemini 2.5 (9.2%). It dominated <b>BrowseComp-VL</b> (27.0% vs. baselines’ ≤13.4%), <b>LiveVQA</b> (58.7% vs. ≤43.9%), and <b>MMSearch</b> (55.3% vs. ≤43.9%). These gains highlight its advantage in tasks requiring visual-textual synthesis, like identifying a snake species using birthplace clues from an image. </p></td></tr><tr><td align="center" valign="top" style="padding-bottom:20px;padding-left:15px;padding-right:15px;padding-top:20px; " class="dd"><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:626px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/4b8269fd-4871-4ef1-8e0d-e9de6b8f4252/image.png?t=1755612190" alt="" height="auto" width="626" style="display:block;width:100%;" border="0"/></td></tr><tr><td align="center" valign="top" class="t" style="width:626px; padding: 4px 0px 4px 0px;"><p>Data generation pipelines.</p></td></tr></table></td></tr><tr><td class="dd" align="left" style="padding:0px 28px;text-align:left;word-break:break-word;"><p style="mso-line-height-alt:150.0%;"> WebWatcher prioritized text search for information-heavy tasks (62% usage in BrowseComp-VL) but balanced image search for visual benchmarks (39% in SimpleVQA). However, WebWatcher still faces challenges in highly specialized domains like advanced physics. </p></td></tr><tr><td align="center" valign="top" style="padding-bottom:20px;padding-left:15px;padding-right:15px;padding-top:20px; " class="dd"><table role="none" border="0" cellspacing="0" cellpadding="0" style="margin:0 auto 0 auto;"><tr><td align="center" valign="top" style="width:626px;"><img src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/uploads/asset/file/d64bd89f-209f-48ff-8184-6a78532a588d/image.png?t=1755612243" alt="" height="auto" width="626" style="display:block;width:100%;" border="0"/></td></tr><tr><td align="center" valign="top" class="t" style="width:626px; padding: 4px 0px 4px 0px;"><p>Main results on HLE.</p></td></tr></table></td></tr><tr class="btn_row"><td valign="top" style="padding-bottom:14px;padding-left:28px;padding-right:28px;padding-top:14px;text-align:center;width:100%;word-break:break-word;" class="dd"><table width="100%" role="none" border="0" cellspacing="0" cellpadding="0" style="margin:14px auto 14px auto;"><tr><td align="center" valign="middle"><table role="none" border="0" cellspacing="0" cellpadding="0"><tr><td style="background-color:#2C81E5;border-radius:8px 8px;mso-padding-alt:14px 20px;" class="btn"><a href="https://elink4f7.mail.bycloud.ai/ss/c/u001.fUNb4GdFo9D3F8WuLArtoV5sElgytBlvJRzI9WtI92bCAmJr2hXktzwS0U7hEHrIOVjCHvrkC6VSnunyRBX-yEWOJoZUyKTRvLt5f1jcPuH3ufDxWj1_VIumalh3zWYF/4j6/KnA6Al0STKuTHTBTwvPuhQ/h22/h001.gsB9uhfHubGXXVGhoLufSJbD4WNKX6qCPxybii4y0jY" target="_blank" rel="noopener noreferrer nofollow" style="background-color:#2C81E5;border-radius:8px 8px;color:#FFFFFF;display:inline-block;font-family:'Open Sans','Segoe UI','Apple SD Gothic Neo','Lucida Grande','Lucida Sans Unicode',sans-serif;font-size:16px;font-weight:normal;line-height:18px;padding:14px 20px;text-decoration:none;"> Read Full Paper </a></td></tr></table></td></tr></table></td></tr><tr><td class="dd" align="center" valign="top" style="padding:20px;"><a href="https://elink4f7.mail.bycloud.ai/ss/c/u001.DUiN96-Eq7pUHzwEhy5j25aF_udDsq8EAwNLhMGYMEAZB-Js_5NvkrySaBFuTjj0prR5stAFtsCxxpeMU92c5-RdHuONI7DFLx_ZkIZVq22WL8Bh5NQwVenxJDvDLof-USeBx7QmDQeVoUJlOutx0Q/4j6/KnA6Al0STKuTHTBTwvPuhQ/h23/h001.mZ_ThEmprVSqpQRzGXkSi_kw6XLvzB9fWS8kgSvRq9E" style="text-decoration:none;"><table align="center" width="100%" cellpadding="0" cellspacing="0" border="0" role="none" style="max-width:520px;margin:0 auto;"><tr><td class="p" width="100%" style="padding:2px;border:none;"><table width="100%" cellpadding="0" cellspacing="0" border="0" role="none"><tr><td align="center" valign="top" style="width:100%;"><div style="max-height:0;position:relative;opacity:0.999;width:100%;mso-hide:all;"><div style="display:inline-block;width:100%;padding-top:25%;"><img width="20%" height="auto" loading="lazy" alt="" style="border:0;" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/static_assets/youtube_play_icon.png"/></div></div><a href="https://elink4f7.mail.bycloud.ai/ss/c/u001.DUiN96-Eq7pUHzwEhy5j25aF_udDsq8EAwNLhMGYMEAZB-Js_5NvkrySaBFuTjj0IDmSyxe6qMr3fFf0fRLGfMz4_VmeAk_6s0Zw8V-w3FrE3gf244COZl15eNaRFqVqPcKbrdbo_5goE884qJHqLQ/4j6/KnA6Al0STKuTHTBTwvPuhQ/h24/h001.2dnTqbvhDPzC4QkSJjczYqOqRkyggj8dPsqaxsLmPM4" style="text-decoration:none;"><img src="https://i.ytimg.com/vi/lkZTSUYfnTI/maxresdefault.jpg" width="480" height="auto" loading="lazy" alt="YouTube video by bycloud" style="display:block;height:auto;border:0;outline:none;text-decoration:none;background-color:#000000;width:100%;"/></a></td></tr><tr><td><p style="font-size:12px;font-weight:500;font-style:italic;font-family:Helvetica, Calibri, sans-serif;color: #686a6d; padding-top:0 !important;padding-bottom:6px !important; padding-left:4px !important;"> Anthropic found a "terrifying" consequence of adding reasoning to AI </p></td></tr></table></td></tr></table></a></td></tr></table></td></tr></table></td></tr><tr><td align="center" valign="top"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" align="center"><tr><td><tr><td class="b" align="center" valign="top" bgcolor="#2a2a2a" style="padding:0px 0px 0px 0px;border-style:solid;border-width: 0px 0px 0px 0px;border-color: #2a2a2a;border-bottom-left-radius:10px;border-bottom-right-radius:10px;"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" align="center"><tr><td align="center" valign="top" bgcolor="#73ddff" style="padding:12px"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" align="center"><tr><td><span style="padding-left:1px;"></span></td><td align="center" valign="middle" width="75" style="width:75px;"><a href="https://elink4f7.mail.bycloud.ai/ss/c/u001.1muhFWIqieRYpaJ-FbWSCQqcWoV4NNHHr5SkP9THApWuHAAlWLQxI3Q_IqFmt_DcyAxeC8jDApCnHmMSBGpBb5sgtimvBYgxRX-Rp7s0F3LjCHoSwdhr83OBqRFhJ1y_/4j6/KnA6Al0STKuTHTBTwvPuhQ/h25/h001._y9pYd8CzFvGatfqFZ7eh6eJBfwV7Z3BGmHCtYTlG7E" style="text-decoration:none;"><img width="22" height="22" alt="tw" border="0" style="display:block;max-width:22px;color:Dark" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/static_assets/x_dark.png"/></a></td><td align="center" valign="middle" width="75" style="width:75px;"><a href="https://elink4f7.mail.bycloud.ai/ss/c/u001.amatuKKICSickUKplYJXmBoQnQ9VXnB2zTxBG4HeHBgjMqVxpoXRdj01cjwyoVlHgiebEOgBvwHtevoVpsSvpn3Q1di2ml6sb3cBM-X6IStQbj_zQSVGWJ8AAmPw2en2/4j6/KnA6Al0STKuTHTBTwvPuhQ/h26/h001.IMI5s_wWMT_ZgUGV7qLREo6fdTxJxvtKqdm-vsXTWUg" style="text-decoration:none;"><img width="22" height="16" alt="yt" border="0" style="display:block;max-width:22px;color:Dark" src="https://media.beehiiv.com/cdn-cgi/image/fit=scale-down,format=auto,onerror=redirect,quality=80/static_assets/youtube_dark.png"/></a></td><td><span style="padding-left:1px;"></span></td></tr></table></td></tr><tr><td height="10" style="line-height:1px;font-size:1px;height:10px;"> </td></tr><tr><td class="w" align="center" valign="top" style="padding:15px 15px 15px 15px;"><table role="none" width="100%" border="0" cellspacing="0" cellpadding="0" align="center"><tr><td align="center" valign="top"><p style="font-family:'Verdana',Geneva,sans-serif;color:#FFFFFF!important;"> Update your email preferences or unsubscribe <a class="link" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.c6q0w4g5sodbtO4I1B_pxWc4htTObwdorovK0nFHVH-4pUdVE0ELYH5DsNemk732SjNwhPNJ25r0O8B5vYifsBhEpz-DJgyVFmavJPa0OyKRRnvw4o7XGyvIv7PRofnmt2xOaOtpG-Zyk4oWSorocdviEdcf2vh61qvQh9KgnLGNfZFG3628jH7-KkQEQ1ij_80k_xn1XjnHEbXcxuMKcAGuBr5hColFiiAowilc22DARyxsGW0ZRzNfDzX830_XCrXUOCVhiyjt9w21U7PYWqvsoU8vG0zd6lmbkCNgA2ftg96kfWajkpKpwd6wKAztzcEI_gGdSFo7w4_lofCzmcRuuy-QpAy7XDDzex_ICMva8At-b1cK7ZYswiO7lH6JB5LsNAIxX6Yx8coqGHNp8ZzP7TOvN95CAfrVb25YAxzCmN-Tq6yPn4me0PvVPW_pu6orD4_nvL5lPKAount2FfkgWPfIoLe-irgXngdk1eDJXy3AdqZorOYK2jFSwSqH5Mgs6OEv1a3zTf1o5_CGgOctqJyjgX-aL_yR_DK_h7YxYsB8HRlCCu-kgslBz6IQCgOvYf3ge1i_eWvSI7cvLEg5RjNQ71Qhc9TWKiHRFjJNFVlgWTKfgtt-z_xZOKWM8DsdNhnkPT8TvpAmodMfH4ka6OEQ-5fQ43j6Wk6TnAwQ3VAc85zibNnLGFefm0wJdd4B4IzVS8kiWEPTvVLWwHSzrEzv0Qu2UaKADY94ZaZXQCEPk9wfQpcrgf_9O9Y-jCgejFlV4Jqd1WYkqXW8-MGKQcealFbKV-zUYpyDGDq_2jb6D3h2G18VqWtCWwisvbaqC9gTh3MSr6cL1o8vlQ/4j6/KnA6Al0STKuTHTBTwvPuhQ/h27/h001.shiR7oZyGVOc1T-DnMHpudHs14F6GGKMyegzodTI_vE" style="text-decoration:underline;text-decoration-color:#FFFFFF!important;color:#FFFFFF!important;"> here</a></p><p class="copyright" style="font-family:'Verdana',Geneva,sans-serif;color:#FFFFFF!important;"> © 2025 bycloudai </p><p style="font-family:'Verdana',Geneva,sans-serif;color:#FFFFFF!important;"> 228 Park Ave S, #29976, New York, New York 10003, United States </p></td></tr><tr style="display: table-row !important;"><td align="center" valign="top" style="padding-top:20px;" style="display:table-cell !important;"><table role="none" border="0" cellspacing="0" cellpadding="0" align="center" style="display:table !important;"><tr style="display:table-row !important;"><td class="u" align="center" valign="middle" height="32" style="height:32px;display:table-cell !important; max-height: 32px !important;margin:0px !important; background-color: #ffffff !important;"><a style="line-height:32px !important;text-decoration:none;display:block !important;" href="https://elink4f7.mail.bycloud.ai/ss/c/u001.DUiN96-Eq7pUHzwEhy5j28olDWFpV5DDKfdk_OdOKOgqhtPJph6W3JhYAUaSPnfugxzkgworTYG5Pq61E17IkFImIbDP7CP1QgajgFZdgvqFEHZJ_VAFsGyQ8Fz7uA7ZLjpMvs70R2Kt8gHRV7QkZT7z9gzPQLoYJ5MXoWxT3U370uZ3rg1NU58xXT0CegmK30w6rzuY3BmWa3-ZFHzGB9_ruMvjCmI0Xzx9972tUi2LC-9trdvpdmlhEDDo-P6i/4j6/KnA6Al0STKuTHTBTwvPuhQ/h28/h001.BRO31kYFH-uimmMtB8uE-TPZOOzbfz7l0ksVw7Hne6g"><img src="https://media.beehiiv.com/output-onlinepngtools.png" width="16" alt="beehiiv logo" style="display:inline-block !important;max-width:16px !important; vertical-align:-3px !important;width: 16px !important;" border="0"/><span style="padding-left:11px !important;display: inline-block !important;">Powered by beehiiv</span></a></td></tr></table></td></tr><tr><td align="left" valign="top" height="2" style="height:2px;"><a href='https://elink4f7.mail.bycloud.ai/ss/c/u001.CxDkkVpJsBdVoe83c_tBWsHIaP4XNp0WgUYqLvHcKk_3uqk_KIkz4ddLinhFbud6JuxLFdSUhYnR7b1NSsmbtzXNGNblnEEMKUtkCAjkn8Y/4j6/KnA6Al0STKuTHTBTwvPuhQ/h29/h001.UpPoSVMBYf5JPs7AplGEgTfM8kR9p6er9yd3493UMLk' style="color: #2a2a2a !important; cursor: default; font-size: 1px; text-decoration: none;"> Terms of Service </a></td></tr></table></td></tr></table></td></tr></td></tr></table></td></tr></table></td></tr></table></td></tr></table></div></body></html>