Skip to content

[Usage]: [V1] Misleading Error Messages #13510

@robertgshaw2-redhat

Description

@robertgshaw2-redhat

Looking for help to improve error messages during startup!

Running a model that does not exist (e.g. MODEL=neuralmagic/Meta-Llama-3-8B-Instruct-FP8-dynamic << this does not exist), gives the following stack trace:

(venv-nm-vllm-abi3) rshaw@beaker:~$ VLLM_USE_V1=1 vllm serve $MODEL --disable-log-requests --no-enable-prefix-caching
INFO 02-19 03:45:16 __init__.py:190] Automatically detected platform cuda.
INFO 02-19 03:45:18 api_server.py:840] vLLM API server version 0.7.2.0
INFO 02-19 03:45:18 api_server.py:841] args: Namespace(subparser='serve', model_tag='neuralmagic/Meta-Llama-3-8B-Instruct-FP8-dynamic', config='', host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, enable_reasoning=False, reasoning_parser=None, tool_call_parser=None, tool_parser_plugin='', model='neuralmagic/Meta-Llama-3-8B-Instruct-FP8-dynamic', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, disable_log_requests=True, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, dis_function=<function serve at 0x74d76eb99990>)
WARNING 02-19 03:45:18 arg_utils.py:1326] Setting max_num_batched_tokens to 8192 for OPENAI_API_SERVER usage context.
Traceback (most recent call last):
  File "/home/rshaw/venv-nm-vllm-abi3/bin/vllm", line 8, in <module>
    sys.exit(main())
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/scripts.py", line 204, in main
    args.dis_function(args)
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/scripts.py", line 44, in serve
    uvloop.run(run_server(args))
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
    return loop.run_until_complete(wrapper())
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 875, in run_server
    async with build_async_engine_client(args) as engine_client:
  File "/home/rshaw/.pyenv/versions/3.10.14/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
  File "/home/rshaw/.pyenv/versions/3.10.14/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 160, in build_async_engine_client_from_engine_args
    engine_client = AsyncLLMEngine.from_engine_args(
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 104, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 1075, in create_engine_config
    model_config = self.create_model_config()
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 998, in create_model_config
    return ModelConfig(
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/config.py", line 302, in __init__
    hf_config = get_config(self.model, trust_remote_code, revision,
  File "/home/rshaw/venv-nm-vllm-abi3/lib/python3.10/site-packages/vllm/transformers_utils/config.py", line 201, in get_config
    raise ValueError(f"No supported config format found in {model}")
ValueError: No supported config format found in neuralmagic/Meta-Llama-3-8B-Instruct-FP8-dynamic

This is confusing

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions