Deploying LLMs via Hugging Face on IBM Cloud

With the Text Generation Inference toolkit from Hugging Face Large Language Models can be hosted efficiently. This post describes how to run open-source models or fine-tuned models on IBM Cloud. T...

Feb 14, 2024

Pinned

Fine-tuning LLMs via Hugging Face on IBM Cloud

The speed of innovation in the AI community is amazing. What didn’t seem to be possible a year ago, is standard today. Fine-tuning is a great example. With the latest progress, you can fine-tune sm...

Feb 13, 2024

Pinned

Foundation Models, Transformers, BERT and GPT

Since I’m excited by the incredible capabilities which technologies like ChatGPT and Bard provide, I’m trying to understand better how they work. This post summarizes my current understanding about...

Feb 24, 2023

Pinned

Fine-tuning LLMs with Apple MLX locally

MLX is a framework for machine learning with Apple silicon from Apple Research. This post describes how to fine-tune a 7b LLM locally in less than 10 minutes on a MacBook Pro M3. MLX is designe...

May 16, 2024

Fine-tuning LLMs locally with Apple Silicon

With recent MacBook Pro machines and frameworks like MLX and llama.cpp fine-tuning of Large Language Models can be done with local GPUs. This post describes how to use InstructLab which provides an...

May 15, 2024

How to stay up to Date with AI News

Recently several people have asked me how I follow AI news. Below are some great resources. YouTube I like watching videos during my lunch break workouts. I can highly recommend the following cha...

May 14, 2024

Running fine-tuned LLM Models on watsonx.ai

Watsonx.ai is IBM’s AI platform built for business. It is provided as SaaS and as software which can be deployed on multiple clouds and on-premises. This post describes how to deploy custom fine-tu...

Apr 25, 2024

Understanding the Watsonx.ai API

Watsonx.ai is IBM’s enterprise studio for AI builders to train, validate, tune and deploy Large Language Models. It comes with multiple open source and IBM LLMs which can be accessed via REST API. ...

Apr 24, 2024

Running Mistral on CPU via llama.cpp

Via quantization LLMs can run faster and on smaller hardware. This post describes how to run Mistral 7b on an older MacBook Pro without GPU. Llama.cpp is an inference stack implemented in C/C++ to...

Mar 10, 2024

Generating synthetic Data with Mixtral

Fine-tuning and aligning language models to follow instructions requires high quality data and a large quantity of data. IBM published a paper that describes how synthetic data can be generated wit...

Mar 6, 2024