- Notifications
You must be signed in to change notification settings - Fork 7.3k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Assigned to nobody Loading
Sort
Pull requests list
[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3
#17955 openedMay 11, 2025 by vllmellm Loading…
Refactor ci/build needs-rebase tpuRelated to Google TPUs v1
#17950 openedMay 10, 2025 by yarongmu-google • Draft
[Bugfix] Add revision to ONLY add when PR is ready to merge/full CI is needed
transformers.Auto*.from_pretrained
processors ready#17948 openedMay 10, 2025 by xinli-centml Loading…
[v1] Support multiple KV cache groups in GPU model runner tpuRelated to Google TPUs v1
#17945 openedMay 10, 2025 by heheda12345 Loading…
[doc] list the hf downloaded models documentationImprovements or additions to documentation
#17940 openedMay 10, 2025 by reidliu41 Loading…
[BugFix] Correct max_model_len derivation from config.json for Mistral format
#17937 openedMay 10, 2025 by princepride Loading…
[doc] update lora doc documentationImprovements or additions to documentation readyONLY add when PR is ready to merge/full CI is needed
#17936 openedMay 10, 2025 by reidliu41 Loading…
[Bugfix] Avoid repeatedly creating dummy data during engine startup multi-modalityRelated to multi-modality (#4194) v1
#17935 openedMay 10, 2025 by DarkLight1337 Loading…
[kernel] integrate permute/unpermute kernel into deepgemm moe
#17934 openedMay 10, 2025 by CalebDu Loading…
[Misc][RFC] Add automated profiling sweep and heatmap visualization tools
#17933 openedMay 10, 2025 by ConstBob Loading…
[WIP] automatically bind CPU OMP Threads of a rank to CPU ids of a NUMA node. ci/build
#17930 openedMay 10, 2025 by louie-tsai Loading…
[Frontend] [Core] Add Tensorizer support for LoRA adapter serialization and deserialization documentationImprovements or additions to documentation
#17926 openedMay 9, 2025 by sangstar Loading…
TESTING CI test completion - no need to merge. ci/build needs-rebase
#17921 openedMay 9, 2025 by Alexei-V-Ivanov-AMD Loading…
[Hardware][Intel-Gaudi] enable text embedding for Intel-Gaudi backend
#17920 openedMay 9, 2025 by libinta Loading…
WIP: fix_llama4_tool_call documentationImprovements or additions to documentation frontend tool-calling
#17917 openedMay 9, 2025 by wukaixingxp • Draft
[Bugfix][V1] Only get input embeddings w/ multi-modal models if first PP readyONLY add when PR is ready to merge/full CI is needed v1
#17916 openedMay 9, 2025 by jinhuang12 Loading…
[Misc] Add compressed-tensors NVFP4A16 emulation support quantization readyONLY add when PR is ready to merge/full CI is needed
#17914 openedMay 9, 2025 by dsikka Loading…
Previous Next
ProTip! Type g p on any issue or pull request to go back to the pull request listing page.