A high-level overview of fine tuning a model to contain my memories, personality, and speaking tone. All from an iPhone, using local compute (2 Mac Studios), to run on a 7b model (Mistral) that is capable of running inference on an iPhone.
I created the data pipelines and training recipes myself, learning on the fly!
The Digital Twin can be kept up to date by regularly fine-tuning on a user's data.
The final prototype built on Mistral 7b ran on less than 15GB of RAM. It is feasible, with further optimizations, to get this performance on, at most, a 1b model using 4GB of RAM or less. Small enough to query very fast on an iPhone.
This unlocks new possibilities such as user simulations, new personalization options, and more.