Discover, Discuss, and Read arXiv papers

Discover new, recommended papers

15 Aug 2025
multi-modal-learningreasoningdeep-reinforcement-learning
Thyme, developed by a collaboration including Kuaishou, CASIA, and other Chinese academic institutions, equips Multimodal Large Language Models (MLLMs) with the ability to autonomously generate and execute code for diverse image processing and complex computations. This open-source framework enhances high-resolution visual perception and quantitative reasoning capabilities, often outperforming larger open-source baselines and reducing hallucination.
18 Aug 2025
multi-modal-learningreasoningtransformers
An empirical study by SenseTime Research and S-Lab NTU benchmarks the spatial intelligence capabilities of state-of-the-art Large Multimodal Models, finding that while GPT-5 establishes a new performance benchmark, it still significantly lags human proficiency in complex spatial reasoning tasks such as mental reconstruction and deformation. The research also introduces a comprehensive taxonomy and standardized protocols for evaluating spatial intelligence in LMMs.
18 Aug 2025
reinforcement-learningtext-generationfine-tuning
The Rubicon framework extends reinforcement learning for large language models to open-ended and subjective tasks by using rubric-based rewards, achieving a +5.2% absolute improvement on humanities benchmarks compared to its base model and outperforming a 671B model by +2.4%. This method facilitates fine-grained stylistic control and produces more human-like responses without compromising general reasoning abilities.
18 Aug 2025
image-generationgenerative-modelssequence-modeling
Researchers from Nanyang Technological University and SenseTime Research introduce Next Visual Granularity Generation (NVG), a framework that decomposes image generation into iterative coarse-to-fine refinement based on varying unique token counts at a consistent resolution. The framework demonstrates enhanced control over visual structure and content, achieving a Fréchet Inception Distance of 2.06 on ImageNet 256x256 while enabling structure-guided image synthesis.
17 Aug 2025
deep-reinforcement-learningreinforcement-learningoptimization-methods
Web proxies such as NGINX commonly rely on least-recently-used (LRU) eviction, which is size agnostic and can thrash under periodic bursts and mixed object sizes. We introduce Cold-RL, a learned eviction policy for NGINX that replaces LRU's forced-expire path with a dueling Deep Q-Network served by an ONNX sidecar within a strict microsecond budget. On each eviction, Cold-RL samples the K least-recently-used objects, extracts six lightweight features (age, size, hit count, inter-arrival time, remaining TTL, and last origin RTT), and requests a bitmask of victims; a hard timeout of 500 microseconds triggers immediate fallback to native LRU. Policies are trained offline by replaying NGINX access logs through a cache simulator with a simple reward: a retained object earns one point if it is hit again before TTL expiry. We compare against LRU, LFU, size-based, adaptive LRU, and a hybrid baseline on two adversarial workloads. With a 25 MB cache, Cold-RL raises hit ratio from 0.1436 to 0.3538, a 146 percent improvement over the best classical baseline; at 100 MB, from 0.7530 to 0.8675, a 15 percent gain; and at 400 MB it matches classical methods (about 0.918). Inference adds less than 2 percent CPU overhead and keeps 95th percentile eviction latency within budget. To our knowledge, this is the first reinforcement learning eviction policy integrated into NGINX with strict SLOs.
14 Aug 2025
synthetic-datatext-generationtransformers
DatologyAI's BeyondWeb framework generates high-quality synthetic data, enabling Large Language Models to achieve state-of-the-art performance with up to 7.7x faster training speed compared to open web data. This approach allows smaller models to outperform larger ones trained on conventional datasets, addressing the data scarcity challenge in pretraining.
14 Aug 2025
reinforcement-learningagentstransformers
A reinforcement learning framework called SSRL trains large language models to perform internal 'self-search,' enabling them to leverage their parametric knowledge as an information source. This approach reduces training costs for LLM agents, enhances efficiency, and demonstrates robust generalization to real external search environments.
15 Aug 2025
vision-language-modelstransformersreasoning
Alibaba Group's Ovis2.5 introduces a Multimodal Large Language Model (MLLM) capable of native-resolution visual perception and advanced reasoning through a 'reflection' thinking mode. The model sets a new state-of-the-art among open-source MLLMs in its parameter class, achieving an average score of 78.3 on the OpenCompass leaderboard for its 9B version, while also releasing a high-performing 2B model for resource-constrained scenarios.
18 Aug 2025
transformersreasoningchain-of-thought
This paper introduces OptimalThinkingBench, a unified benchmark designed to evaluate how Large Language Models balance performance and computational efficiency, specifically addressing tendencies of overthinking on simple tasks and underthinking on complex ones. Its comprehensive evaluation of 33 models reveals that no current LLM achieves an optimal balance, highlighting a fundamental trade-off that existing models and mitigation strategies struggle to resolve.
18 Aug 2025
generative-modelsneural-renderingimage-generation
4DNeX introduces the first feed-forward framework that generates dynamic 3D scene representations from a single image, enabling high-quality 4D geometry inference and novel-view video synthesis within 15 minutes. This approach tackles data scarcity by curating the 4DNeX-10M dataset and adapts a pretrained video diffusion model using a unified 6D video representation.