site:www.marktechpost.com

marktechpost3d

TensorLLM: Enhancing Reasoning and Efficiency in Large Language Models through Multi-Head Attention Compression and Tensorisation

LLMs based on transformer architectures, such as GPT and LLaMA series, have excelled in NLP tasks due to their extensive parameterization and large training datasets. However, research indicates that ...

marktechpost3d

NVIDIA AI Releases Eagle2 Series Vision-Language Model: Achieving SOTA Results Across Various Multimodal Benchmarks

Vision-Language Models (VLMs) have significantly expanded AI’s ability to process multimodal information, yet they face persistent challenges. Proprietary models such as GPT-4V and Gemini-1.5-Pro ...

marktechpost3d

Meta AI Introduces MR.Q: A Model-Free Reinforcement Learning Algorithm with Model-Based Representations for Enhanced Generalization

Reinforcement learning (RL) trains agents to make sequential decisions by maximizing cumulative rewards. It has diverse applications, including robotics, gaming, and automation, where agents interact ...

marktechpost5d

DeepSeek-AI Releases Janus-Pro 7B: An Open-Source multimodal AI that Beats DALL-E 3 and Stable Diffusion

Multimodal AI integrates diverse data formats, such as text and images, to create systems capable of accurately understanding and generating content. By bridging textual and visual data, these models ...

marktechpost3d

Optimization Using FP4 Quantization For Ultra-Low Precision Language Model Training

Large Language Models (LLMs) have emerged as transformative tools in research and industry, with their performance directly correlating to model size. However, training these massive models presents ...

marktechpost4d

Qwen AI Introduces Qwen2.5-Max: A large MoE LLM Pretrained on Massive Data and Post-Trained with Curated SFT and RLHF Recipes

The field of artificial intelligence is evolving rapidly, with increasing efforts to develop more capable and efficient language models. However, scaling these models comes with challenges, ...

marktechpost4d

Qwen AI Releases Qwen2.5-VL: A Powerful Vision-Language Model for Seamless Computer Interaction

In the evolving landscape of artificial intelligence, integrating vision and language capabilities remains a complex challenge. Traditional models often struggle with tasks requiring a nuanced ...

marktechpost6d

Meet Open R1: The Full Open Reproduction of DeepSeek-R1, Challenging the Status Quo of Existing Proprietary LLMs

Open Source LLM development is going through great change through fully reproducing and open-sourcing DeepSeek-R1, including training data, scripts, etc. Hosted on Hugging Face’s platform, this ...

marktechpost7d

Alibaba Researchers Propose VideoLLaMA 3: An Advanced Multimodal Foundation Model for Image and Video Understanding

Advancements in multimodal intelligence depend on processing and understanding images and videos. Images can reveal static scenes by providing information regarding details such as objects, text, and ...

marktechpost6d

HAC++: Revolutionizing 3D Gaussian Splatting Through Advanced Compression Techniques

Novel view synthesis has witnessed significant advancements recently, with Neural Radiance Fields (NeRF) pioneering 3D representation techniques through neural rendering. While NeRF introduced ...

marktechpost5d

Unlocking Autonomous Planning in LLMs: How AoT+ Overcomes Hallucinations and Cognitive Load

Large language models (LLMs) have shown remarkable abilities in language tasks and reasoning, but their capacity for autonomous planning—especially in complex, multi-step scenarios—remains limited.

marktechpost5d

Building a Retrieval-Augmented Generation (RAG) System with DeepSeek R1: A Step-by-Step Guide

With the release of DeepSeek R1, there is a buzz in the AI community. The open-source model offers some best-in-class performance across many metrics, even at par with state-of-the-art proprietary ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results