@arqeria @letsenvision an small local model that is build just to describe images, and, fined-tuned with human curated data contributed by Blind, Low Vision and human describers.
the AI principles or constitutions are there to guide the fine tuning process, like be helpful, be brief, be concise, but this is just a wattered down example.
to create a constitution, we need to answer questions like, what does a good descriptions entailed? how to describe dressings? how to describe humans?