{"id":5503,"date":"2026-05-11T16:11:19","date_gmt":"2026-05-11T08:11:19","guid":{"rendered":"https:\/\/www.haruhi.fans\/?p=5503"},"modified":"2026-05-13T13:31:28","modified_gmt":"2026-05-13T05:31:28","slug":"%e8%ae%ba%e6%96%87%e8%b0%83%e7%a0%94","status":"publish","type":"post","link":"https:\/\/www.haruhi.fans\/?p=5503","title":{"rendered":"\u8bba\u6587\u8c03\u7814"},"content":{"rendered":"<p><html><head><\/head><body><\/p>\n<p><a href=\"https:\/\/www.alphaxiv.org\/abs\/2604.23099?chatId=019e160f-99e4-771a-a1c5-9af56bd79bd5\">bookmark<\/a><\/p>\n<p>\u672c\u8d28\u662f\u7b5b\u9009\u6837\u672c\uff0c\u4f46\u662f\u6838\u5fc3\u601d\u60f3\u65b9\u6cd5\u4e0a\u7565\u6709\u521b\u65b0\uff1a\u53f6\u65af\u5efa\u6a21 + \u4e3b\u52a8\u5b66\u4e60 + \u9ad8\u65af\u8fc7\u7a0b\u3002<\/p>\n<p>\u53ef\u8fc1\u79fb\u6027\u5f88\u9ad8\uff0c**\u9ad8\u6548\u6027\u80fd\u4f30\u7b97\u3002**\u4e14\u5bf9\u4e8e\u9898\u76ee\u6765\u8bf4\uff0c\u53ef\u4ee5\u4e3b\u52a8\u63a2\u7d22\u5230\u54ea\u4e9b\u9898\u76ee\u6709\u95ee\u9898\u7684\u591a\uff0c\u53ef\u4ee5\u66f4\u5feb\u7684\u7b5b\u9009\u51fa\u6765\u3002<\/p>\n<p><a href=\"https:\/\/www.alphaxiv.org\/abs\/2603.23234\">MemCollab: Cross-Agent Memory Collaboration via Contrastive Trajectory Distillation | alphaXiv<\/a><\/p>\n<p>\u65b9\u6cd5\uff1a\u5bf9\u6bd4\u591a\u4e2aagent\u8f68\u8ff9\uff0c\u7136\u540e\u7ecf\u9a8c\u63d0\u53d6\u3002\u6700\u540e\u8bf4\u660e\u8fd9\u4e2a\u53ef\u4ee5\u63d0\u9ad8\u7ed3\u679c\u3002\u3002<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.haruhi.fans\/wp-content\/uploads\/2026\/05\/notion-35df1125-22db-80d7-80db-db9a6bf9b5c0.png\" alt=\"image.png\"><\/p>\n<p>\u6700\u540e\u7ecf\u9a8c\u662fprompt\u5f62\u5f0f\u3002<\/p>\n<p><a href=\"https:\/\/www.alphaxiv.org\/abs\/2504.06188?chatId=019e166c-6a99-74e1-bc7b-a162cada8a44\">SkillFlow: Scalable and Efficient Agent Skill Retrieval System | alphaXiv<\/a><\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.haruhi.fans\/wp-content\/uploads\/2026\/05\/notion-35df1125-22db-801c-9c66-c49c0e2628d7.png\" alt=\"image.png\"><\/p>\n<p>\u4ece\u5927\u91cf\u7684skill\u4e2d\u53ec\u56de\u7b5b\u9009\uff0c\u5f97\u5230\u6700\u6709\u6548\u7684skill\u7528\u3002\u7136\u540e\u53d1\u73b0\uff1a<\/p>\n<ul>\n<li><strong>\u5e9f\u8bdd\u6ca1\u7528\uff1a<\/strong>&nbsp;\u6587\u6863\u7684\u957f\u77ed\u3001\u89e3\u91ca\u7684\u591a\u5be1\u5e76\u4e0d\u5f71\u54cd\u6548\u679c\u3002<\/li>\n<li><strong>\u4ee3\u7801\u4e3a\u738b\uff1a<\/strong>&nbsp;\u771f\u6b63\u6709\u6548\u7684\u6280\u80fd\uff0c\u5176<strong>\u4ee3\u7801\u5757\uff08Code Blocks\uff09\u5360\u6bd4\u663e\u8457\u66f4\u9ad8<\/strong>\uff0c\u4e14\u901a\u5e38\u9644\u5e26\u4e86<strong>\u53ef\u6267\u884c\u7684\u811a\u672c\u6587\u4ef6<\/strong>\u3002<\/li>\n<li><strong>\u542f\u53d1\uff1a<\/strong>&nbsp;\u672a\u6765\u5927\u5bb6\u5728\u5f80\u6280\u80fd\u5e93\u8d21\u732e\u5185\u5bb9\u65f6\uff0c\u5c11\u5199\u5c0f\u4f5c\u6587\uff0c\u591a\u7ed9\u80fd\u8dd1\u7684\u4ee3\u7801\u3002<\/li>\n<\/ul>\n<p><a href=\"https:\/\/www.alphaxiv.org\/abs\/2604.17308?chatId=019e1692-8703-7347-8eec-59e56619639a\">SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents | alphaXiv<\/a><\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.haruhi.fans\/wp-content\/uploads\/2026\/05\/notion-35df1125-22db-803f-94f7-fc0c01ed7941.png\" alt=\"image.png\"><\/p>\n<p>\u4e00\u4e2a\u6d4b\u8bd5\u80fd\u5426\u957f\u671f\u79ef\u7d2f\u8fd0\u7528\u7ecf\u9a8c\u7684benchmark<\/p>\n<p><a href=\"https:\/\/www.alphaxiv.org\/abs\/2604.10674?chatId=019e16a2-c899-757a-8afe-f605a8acab72\">Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents | alphaXiv<\/a><\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.haruhi.fans\/wp-content\/uploads\/2026\/05\/notion-35df1125-22db-80e3-9859-cbf92ecd9d02.png\" alt=\"image.png\"><\/p>\n<h2>Bench\u8c03\u67e5<\/h2>\n<table>\n<thead>\n<tr>\n<th>Bench<\/th>\n<th>\u80cc\u666f<\/th>\n<th>\u94fe\u63a5<\/th>\n<th>\u516c\u5f00\u53c2\u8bc4\u6a21\u578b\u6570\u91cf<\/th>\n<th><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>SWE-bench Verified<\/strong><\/td>\n<td>GitHub issue \u4fee\u590d\uff0c\u8f6f\u5de5 agent<\/td>\n<td><a href=\"https:\/\/github.com\/swe-bench\/experiments?utm_source=chatgpt.com\">SWE-bench\/experiments: Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.<\/a><\/td>\n<td>100+<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Terminal-Bench 2.0<\/td>\n<td>\u7ec8\u7aef\u73af\u5883\u4e2d\u5b8c\u6210\u771f\u5b9e\u4efb\u52a1\uff0c\u8f6f\u4ef6\u5de5\u7a0b\/ML\/\u5b89\u5168\/\u6570\u636e\u79d1\u5b66\/\u7cfb\u7edf\u7ba1\u7406<\/td>\n<td><a href=\"https:\/\/huggingface.co\/datasets\/yoonholee\/terminalbench-trajectories?utm_source=chatgpt.com\">yoonholee\/terminalbench-trajectories \u00b7 Datasets at Hugging Face<\/a><\/td>\n<td>40+<\/td>\n<td>\u6709\u4e9b\u7a7a\u8f68\u8ff9\uff0c\u4f46\u4e0d\u77e5\u9053\u5360\u6bd4\u5f71\u54cd<\/td>\n<\/tr>\n<tr>\n<td>MathArena<\/td>\n<td>\u7ade\u8d5b\u4e2d\u6781\u96be\u9898<\/td>\n<td><a href=\"https:\/\/huggingface.co\/collections\/MathArena\/matharena-outputs\">MathArena Outputs &#8211; a MathArena Collection<\/a><\/td>\n<td>18\u79cd\u7ade\u8d5b\u9898\uff0c\u6bcf\u4e00\u4e2a6\u201353 \u9898\uff0c\u6a21\u578b\u6570\u4e00\u822c 20\u201370 \u4e2a\u3002\u4f46\u91cd\u590d\u5f88\u591a\uff0c\u957f\u8f68\u8ff9<\/td>\n<td>\u957fCOT\u7ed3\u679c\u662f<\/td>\n<\/tr>\n<tr>\n<td>Toolathlon<\/td>\n<td>\u957f\u7a0b\u3001\u591a\u5de5\u5177\u3001\u591a app \u771f\u5b9e\u4efb\u52a1\u6267\u884c<\/td>\n<td><a href=\"https:\/\/huggingface.co\/datasets\/hkust-nlp\/Toolathlon-Trajectories\">hkust-nlp\/Toolathlon-Trajectories \u00b7 Datasets at Hugging Face<\/a><\/td>\n<td>17 \u6a21\u578b \u00d7 3 runs\uff0c\u8d85\u8fc7 5,000 \u6761 task execution records<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>LiveCodeBench<\/td>\n<td>\u52a8\u6001\u7f16\u7a0b\u9898<\/td>\n<td><\/td>\n<td><a href=\"https:\/\/github.com\/LiveCodeBench\/submissions\">https:\/\/github.com\/LiveCodeBench\/submissions<\/a><\/td>\n<td>\u5355\u8f6e<\/td>\n<\/tr>\n<tr>\n<td>Codeforces Rating<\/td>\n<td>\u7ade\u6280\u7f16\u7a0b\u80fd\u529b<\/td>\n<td><\/td>\n<td><\/td>\n<td>\u65e0<\/td>\n<\/tr>\n<tr>\n<td>Aider Polyglot<\/td>\n<td>\u771f\u5b9e\u4ee3\u7801\u7f16\u8f91<\/td>\n<td><\/td>\n<td><\/td>\n<td>\u65e0\u5177\u4f53traj<\/td>\n<\/tr>\n<tr>\n<td>Codeforces Rating<\/td>\n<td>\u7ade\u6280\u7f16\u7a0b\u80fd\u529b<\/td>\n<td><\/td>\n<td><\/td>\n<td>\u65e0<\/td>\n<\/tr>\n<tr>\n<td>MMLU-Pro \/ GPQA \/ HLE \/ AIME \/ HMMT \/ SimpleQA<\/td>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<td>\u65e0<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<table>\n<thead>\n<tr>\n<th><strong>test_acc<\/strong><\/th>\n<th><strong>test_saved<\/strong><\/th>\n<th><strong>test_drop_pp<\/strong><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>77.60%<\/td>\n<td>79.83%<\/td>\n<td>-12.92pp<\/td>\n<\/tr>\n<tr>\n<td>81.57%<\/td>\n<td>66.41%<\/td>\n<td>-7.72pp<\/td>\n<\/tr>\n<tr>\n<td>87.97%<\/td>\n<td>45.65%<\/td>\n<td>-2.26pp<\/td>\n<\/tr>\n<tr>\n<td>90.01%<\/td>\n<td>39.06%<\/td>\n<td>-1.30pp<\/td>\n<\/tr>\n<tr>\n<td>91.83%<\/td>\n<td>32.86%<\/td>\n<td>-0.68pp<\/td>\n<\/tr>\n<tr>\n<td>93.58%<\/td>\n<td>28.61%<\/td>\n<td>-0.07pp<\/td>\n<\/tr>\n<tr>\n<td>98.90%<\/td>\n<td>17.39%<\/td>\n<td>0.07pp<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<table>\n<thead>\n<tr>\n<th><strong>acc<\/strong><\/th>\n<th><strong>saved<\/strong><\/th>\n<th><strong>drop_pp<\/strong><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>65.84%<\/td>\n<td>74.39%<\/td>\n<td>-22.54pp<\/td>\n<\/tr>\n<tr>\n<td>65.84%<\/td>\n<td>74.39%<\/td>\n<td>-22.54pp<\/td>\n<\/tr>\n<tr>\n<td>71.10%<\/td>\n<td>55.49%<\/td>\n<td>-12.70pp<\/td>\n<\/tr>\n<tr>\n<td>86.82%<\/td>\n<td>33.82%<\/td>\n<td>-2.25pp<\/td>\n<\/tr>\n<tr>\n<td>89.29%<\/td>\n<td>29.65%<\/td>\n<td>-1.64pp<\/td>\n<\/tr>\n<tr>\n<td>94.74%<\/td>\n<td>25.62%<\/td>\n<td>-0.61pp<\/td>\n<\/tr>\n<tr>\n<td>98.77%<\/td>\n<td>22.36%<\/td>\n<td>0.20pp<\/td>\n<\/tr>\n<tr>\n<td>98.57%<\/td>\n<td>20.30%<\/td>\n<td>0.20pp<\/td>\n<\/tr>\n<tr>\n<td>100.00%<\/td>\n<td>14.00%<\/td>\n<td>0.00pp<\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<table>\n<thead>\n<tr>\n<th><strong>target<\/strong><\/th>\n<th><strong>actual<\/strong><\/th>\n<th><strong>save<\/strong><\/th>\n<th><strong>drop pp<\/strong><\/th>\n<th><strong>succ prec<\/strong><\/th>\n<th><strong>fail prec<\/strong><\/th>\n<th><strong>succ weight<\/strong><\/th>\n<th><strong>fail weight<\/strong><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>0.75<\/td>\n<td>78.1%<\/td>\n<td>72.0%<\/td>\n<td>12.0<\/td>\n<td>75.6%<\/td>\n<td>78.7%<\/td>\n<td>19.3%<\/td>\n<td>80.7%<\/td>\n<\/tr>\n<tr>\n<td>0.80<\/td>\n<td>80.1%<\/td>\n<td>64.5%<\/td>\n<td>11.4<\/td>\n<td>80.1%<\/td>\n<td>80.1%<\/td>\n<td>19.7%<\/td>\n<td>80.3%<\/td>\n<\/tr>\n<tr>\n<td>0.85<\/td>\n<td>85.8%<\/td>\n<td>50.3%<\/td>\n<td>7.6<\/td>\n<td>86.5%<\/td>\n<td>85.7%<\/td>\n<td>18.3%<\/td>\n<td>81.7%<\/td>\n<\/tr>\n<tr>\n<td>0.90<\/td>\n<td>90.2%<\/td>\n<td>38.8%<\/td>\n<td>4.5<\/td>\n<td>90.4%<\/td>\n<td>90.2%<\/td>\n<td>16.6%<\/td>\n<td>83.4%<\/td>\n<\/tr>\n<tr>\n<td>0.95<\/td>\n<td>95.3%<\/td>\n<td>22.1%<\/td>\n<td>1.5<\/td>\n<td>95.1%<\/td>\n<td>95.4%<\/td>\n<td>13.7%<\/td>\n<td>86.3%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><\/body><\/html><\/p>\n","protected":false},"excerpt":{"rendered":"<p>bookmark \u672c\u8d28\u662f\u7b5b\u9009\u6837\u672c\uff0c\u4f46\u662f\u6838\u5fc3\u601d\u60f3\u65b9\u6cd5\u4e0a\u7565\u6709\u521b\u65b0\uff1a\u53f6\u65af\u5efa\u6a21 + \u4e3b\u52a8\u5b66\u4e60 + \u9ad8\u65af\u8fc7\u7a0b\u3002 \u53ef\u8fc1\u79fb\u6027 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32],"tags":[],"class_list":["post-5503","post","type-post","status-publish","format-standard","hentry","category-32"],"_links":{"self":[{"href":"https:\/\/www.haruhi.fans\/index.php?rest_route=\/wp\/v2\/posts\/5503","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.haruhi.fans\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.haruhi.fans\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.haruhi.fans\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.haruhi.fans\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5503"}],"version-history":[{"count":20,"href":"https:\/\/www.haruhi.fans\/index.php?rest_route=\/wp\/v2\/posts\/5503\/revisions"}],"predecessor-version":[{"id":5529,"href":"https:\/\/www.haruhi.fans\/index.php?rest_route=\/wp\/v2\/posts\/5503\/revisions\/5529"}],"wp:attachment":[{"href":"https:\/\/www.haruhi.fans\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5503"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.haruhi.fans\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5503"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.haruhi.fans\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5503"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}