I am new to LLM and currectly learning to run the Llama 3.2 (1B & 3B versions) on my desktop. I want to run it first and figure out how it works. Then fine-tune it with my own dataset. However, I got tripped up at the beginning. I don’t know how to run it locally at the code level since there are few helpful tutorials (maybe I just couldn’t find it). Thus, I came here looking for help.
My current progress is as follows:
- I have already downloaded the Llama 3 official code from Github.
- And got the model weights (I guess so?) from Hugging Face.
- I have installed an Ubuntu 20.04 system and CUDA 11.8 and cuDNN 8.9.6 on my desktop (equipped with NVIDIA GeForce RTX 3090).
- With the instructions on Hugging Face, I tried to use vllm and run
vllm serve "meta-llama/Llama-3.2-1B"
andcurl http://localhost:8080/v1/models
in the terminal. The first try was successful and I got output like . But subsequent attempts were failed and it showed:curl: (7) Failed to connect to localhost port 8080 after 0 ms: Couldn't connect to server
. I wonder what’s wrong here and how can I fix it? - After these are done, I wonder whether I can run Llama in PyCharm or Jupyter in a way that I can load the model weights for the model with vllm or something like that. Then I could use LoRA or other fine-tuning methods to train my own model. Unfortunately I am still not sure how to achieve this.
Thus, I have two questions currently:
- Is it possible to run Llama at the code level in PyCharm or Jupyter on Ubuntu? If so, how?
- How to fine-tune Llama with our own dataset?
I guess there are probably lots of green hands like me who run into similar problems and confusion, and get discouraged from the beginning. Sincerely look forward to your help. Any help from you will be appreciated.