AI 资讯聚合站
汇聚 AI 行业最新新闻、技术文章和深度分析,涵盖大语言模型、AI Agent、多模态 AI、生成式 AI 等领域
HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games
arXiv:2510.12563v3 Announce Type: replace Abstract: Large Reasoning Models (LRMs) have demonstrated impressive performan...
The Information-Theoretic Imperative: Compression and the Epistemic Foundations of Intelligence
arXiv:2510.25883v2 Announce Type: replace Abstract: Why do brains and deep networks converge on similar representations?...
Analyzing and Improving Fast Sampling of Text-to-Image Diffusion Models
arXiv:2603.00763v1 Announce Type: new Abstract: Text-to-image diffusion models have achieved unprecedented success but s...
DAG-Math: Graph-of-Thought Guided Mathematical Reasoning in LLMs
arXiv:2510.19842v2 Announce Type: replace Abstract: Large Language Models (LLMs) demonstrate strong performance on mathe...
ScholarEval: Research Idea Evaluation Grounded in Literature
arXiv:2510.16234v2 Announce Type: replace Abstract: As AI tools become increasingly common for research ideation, robust...
OpenAutoNLU: Open Source AutoML Library for NLU
arXiv:2603.01824v1 Announce Type: new Abstract: OpenAutoNLU is an open-source automated machine learning library for nat...
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
arXiv:2510.04284v3 Announce Type: replace Abstract: The professionalism of a human doctor in outpatient service depends ...
ScholarEval: Research Idea Evaluation Grounded in Literature
arXiv:2510.16234v2 Announce Type: replace Abstract: As AI tools become increasingly common for research ideation, robust...
DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
arXiv:2510.04474v2 Announce Type: replace Abstract: Recent large reasoning models (LRMs) driven by reinforcement learnin...
A Representation-Consistent Gated Recurrent Framework for Robust Medical Time-Series Classification
arXiv:2603.00067v1 Announce Type: new Abstract: Medical time-series data are characterized by irregular sampling, high n...
Diversity over Uniformity: Rethinking Representation in Generated Image Detection
arXiv:2603.00717v1 Announce Type: new Abstract: With the rapid advancement of generative models, generated image detecti...
BornoViT: A Novel Efficient Vision Transformer for Bengali Handwritten Basic Characters Classification
arXiv:2603.00755v1 Announce Type: new Abstract: Handwritten character classification in the Bengali script is a signific...
HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games
arXiv:2510.12563v3 Announce Type: replace Abstract: Large Reasoning Models (LRMs) have demonstrated impressive performan...
HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games
arXiv:2510.12563v3 Announce Type: replace Abstract: Large Reasoning Models (LRMs) have demonstrated impressive performan...
8点1氪丨椰树集团再陷擦边营销风波被约谈;电影难看20分钟内可退款40%,一影院试行“观影后悔权”;中欧航线票价暴涨
今日热点导览 伊朗称霍尔木兹海峡已关闭: “不会让一滴石油流出” 雷军:小米机器人已在汽车工厂实习,未来5年大批人形机器人进厂 亚马逊在阿联酋数据中心遭撞击起火 多家金店暂停投资金条销售 北京银行贵金属业务出现BUG BOSS直聘称网传伊朗...
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
arXiv:2510.04284v3 Announce Type: replace Abstract: The professionalism of a human doctor in outpatient service depends ...
DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
arXiv:2510.04474v2 Announce Type: replace Abstract: Recent large reasoning models (LRMs) driven by reinforcement learnin...
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
arXiv:2510.04284v3 Announce Type: replace Abstract: The professionalism of a human doctor in outpatient service depends ...
Semantic Novelty Trajectories in 80,000 Books: A Cross-Corpus Embedding Analysis
arXiv:2603.01791v1 Announce Type: new Abstract: I apply Schmidhuber's compression progress theory of interestingness at ...
FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
arXiv:2510.04040v2 Announce Type: replace Abstract: Large language models (LLMs) increasingly rely on Chain-of-Thought (...
FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
arXiv:2510.04040v2 Announce Type: replace Abstract: Large language models (LLMs) increasingly rely on Chain-of-Thought (...
FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
arXiv:2510.04040v2 Announce Type: replace Abstract: Large language models (LLMs) increasingly rely on Chain-of-Thought (...
Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
arXiv:2510.01367v4 Announce Type: replace Abstract: Reward hacking, where a reasoning model exploits loopholes in a rewa...
Diversity over Uniformity: Rethinking Representation in Generated Image Detection
arXiv:2603.00717v1 Announce Type: new Abstract: With the rapid advancement of generative models, generated image detecti...
A Reconstruction System for Industrial Pipeline Inner Walls Using Panoramic Image Stitching with Endoscopic Imaging
arXiv:2603.00714v1 Announce Type: new Abstract: Visual analysis and reconstruction of pipeline inner walls remain challe...
Understanding the Role of Training Data in Test-Time Scaling
arXiv:2510.03605v2 Announce Type: replace Abstract: Test-time scaling improves the reasoning capabilities of large langu...
A Reconstruction System for Industrial Pipeline Inner Walls Using Panoramic Image Stitching with Endoscopic Imaging
arXiv:2603.00714v1 Announce Type: new Abstract: Visual analysis and reconstruction of pipeline inner walls remain challe...
Understanding the Role of Training Data in Test-Time Scaling
arXiv:2510.03605v2 Announce Type: replace Abstract: Test-time scaling improves the reasoning capabilities of large langu...
Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
arXiv:2510.01367v4 Announce Type: replace Abstract: Reward hacking, where a reasoning model exploits loopholes in a rewa...
Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
arXiv:2510.01367v4 Announce Type: replace Abstract: Reward hacking, where a reasoning model exploits loopholes in a rewa...
nchellwig at SemEval-2026 Task 3: Self-Consistent Structured Generation (SCSG) for Dimensional Aspect-Based Sentiment Analysis using Large Language Models
arXiv:2603.01788v1 Announce Type: new Abstract: We present Self-Consistent Structured Generation (SCSG) for Dimensional ...
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
arXiv:2509.24156v2 Announce Type: replace Abstract: Large reasoning models (LRMs) exhibit unprecedented capabilities in ...
Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
arXiv:2510.01367v4 Announce Type: replace Abstract: Reward hacking, where a reasoning model exploits loopholes in a rewa...
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
arXiv:2509.24393v2 Announce Type: replace Abstract: Although Large Reasoning Models (LRMs) have progressed in solving co...
BiJEPA: Bi-directional Joint Embedding Predictive Architecture for Symmetric Representation Learning
arXiv:2603.00049v1 Announce Type: new Abstract: Self-Supervised Learning (SSL) has shifted from pixel-level reconstructi...
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
arXiv:2509.24393v2 Announce Type: replace Abstract: Although Large Reasoning Models (LRMs) have progressed in solving co...
Towards Khmer Scene Document Layout Detection
arXiv:2603.00707v1 Announce Type: new Abstract: While document layout analysis for Latin scripts has advanced significan...
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
arXiv:2509.24393v2 Announce Type: replace Abstract: Although Large Reasoning Models (LRMs) have progressed in solving co...
Towards Universal Khmer Text Recognition
arXiv:2603.00702v1 Announce Type: new Abstract: Khmer is a low-resource language characterized by a complex script, pres...
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
arXiv:2509.24156v2 Announce Type: replace Abstract: Large reasoning models (LRMs) exhibit unprecedented capabilities in ...
ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems
arXiv:2509.23465v2 Announce Type: replace Abstract: Solving the Traveling Salesman Problem (TSP) is NP-hard yet fundamen...
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
arXiv:2509.24156v2 Announce Type: replace Abstract: Large reasoning models (LRMs) exhibit unprecedented capabilities in ...
LLM-as-an-Annotator: Training Lightweight Models with LLM-Annotated Examples for Aspect Sentiment Tuple Prediction
arXiv:2603.01778v1 Announce Type: new Abstract: Training models for Aspect-Based Sentiment Analysis (ABSA) tasks require...
Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning and Contextual Stochastic Optimization Framework
arXiv:2603.00010v1 Announce Type: new Abstract: Transit Network Design is a well-studied problem in the field of transpo...
From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
arXiv:2509.23415v2 Announce Type: replace Abstract: Despite the impressive performance of LLM-powered agents, their adop...
BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
arXiv:2509.23589v3 Announce Type: replace Abstract: Diffusion-based planners have shown strong potential for autonomous ...
Bilinear representation mitigates reversal curse and enables consistent model editing
arXiv:2509.21993v3 Announce Type: replace Abstract: The reversal curse--a language model's inability to infer an unseen ...
ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems
arXiv:2509.23465v2 Announce Type: replace Abstract: Solving the Traveling Salesman Problem (TSP) is NP-hard yet fundamen...
Beyond the Resum\'e: A Rubric-Aware Automatic Interview System for Information Elicitation
arXiv:2603.01775v1 Announce Type: new Abstract: Effective hiring is integral to the success of an organisation, but it i...
SCOUT: Fast Spectral CT Imaging in Ultra LOw-data Regimes via PseUdo-label GeneraTion
arXiv:2603.00687v1 Announce Type: new Abstract: Noise and artifacts during computed tomography (CT) scans are a fundamen...
LLMs can unmask pseudonymous users at scale with surprising accuracy
Burner accounts on social media sites can increasingly be analyzed to identify the pseudonymous users who post to them u...
TokenSplat: Token-aligned 3D Gaussian Splatting for Feed-forward Pose-free Reconstruction
arXiv:2603.00697v1 Announce Type: new Abstract: We present TokenSplat, a feed-forward framework for joint 3D Gaussian re...
BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
arXiv:2509.23589v3 Announce Type: replace Abstract: Diffusion-based planners have shown strong potential for autonomous ...
STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification
arXiv:2603.00695v1 Announce Type: new Abstract: Multi-modal object Re-Identification (ReID) aims to exploit complementar...
ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems
arXiv:2509.23465v2 Announce Type: replace Abstract: Solving the Traveling Salesman Problem (TSP) is NP-hard yet fundamen...
Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles
arXiv:2509.21028v3 Announce Type: replace Abstract: We introduce SciTrek, a diagnostic question-answering benchmark desi...
ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems
arXiv:2509.23465v2 Announce Type: replace Abstract: Solving the Traveling Salesman Problem (TSP) is NP-hard yet fundamen...
AnnoABSA: A Web-Based Annotation Tool for Aspect-Based Sentiment Analysis with Retrieval-Augmented Suggestions
arXiv:2603.01773v1 Announce Type: new Abstract: We introduce AnnoABSA, the first web-based annotation tool to support th...
From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
arXiv:2509.23415v2 Announce Type: replace Abstract: Despite the impressive performance of LLM-powered agents, their adop...
From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
arXiv:2509.23415v2 Announce Type: replace Abstract: Despite the impressive performance of LLM-powered agents, their adop...