Kaveinthran: "@arqeria Let's steer the discu…" - disabled.social

Recent searches

Search options

Not available on disabled.social.

Haily Merry @arqeria@dragonscave.space

@weirdwriter This is rather missing the point of my entire argument. I’m not saying that the human brain works like a computer, in point of fact we still don’t really know a whole lot about how the human brain works, but in any case you can’t really compare a living organism with something man-made. The point I’m making is that the process by which we learn and the process by which AI learns are much more similar than you’d probably like to admit.

Kaveinthran @kaveinthran

@arqeria @weirdwriter@tweesecake while I am on the side of Robert here, I guess you made many good points. The fact that LLMs being seen as the entire AI thingy can be reductive, as symbolic computing and other strands of cognitive sciences are creating this huge field of AI. and, probably, by few years, this tech can be a good tool with its own flaw to assist us in specific, measurable tasks. I am just worried that AI as an academic field would be narrowed due to the proffit-making intent of capitalism.

Kaveinthran @kaveinthran

@arqeria Even with LLMs, we need to look beyond training data as most of it right now is a web-scraped images which most of it are without good alttext, and have a very low resource value on ASIAN context, cultures and nonhuman context, like various animals, plants etc, so, it is hitting its peak already. The funding is so much congested on letting the LLM game goes well, and its not good to begin with.

Kaveinthran @kaveinthran

@arqeria Other strand of AIneed to be more expanded, and this process of converting research into capital needed to be stop. if we need a real intelligent system that can contribute to the understanding of complex system, cognition and how intelligence work. It's sad when a field that is full of potential becomes a hyped product.
I follow the AI spaces both from the developer and also from the Academic route, and I feel we need to have more nuanced, grey-hat argument on the potential of AI.

Kaveinthran @kaveinthran@disabled.social

@arqeria Let's steer the discussion very narrowly to the task at hand, which is image description,
the vision language model are trained generatively to be good at multiple image evaluation task, describing it is only one of them. By listening to the talk description to me podcast, I learnt how much of an expertese goes into describing a scene well, I don't think so a vision model deployed by big companies want to train a model that can do only that work best.

Jun 27, 2024, 01:00 PM··Semaphore

0boosts·0favorites

Kaveinthran @kaveinthran

@arqeria what we need is a good vision language models that can be fined-tuned with thousands of human-curated Q&A of Blind people queering about an image. It needed to be additionally refined with many millions pairs of poor images, good image and descriptions
this is to let the model learn many facet of one image, that may come from phone camera, or poor lighting etc.
we also need and eval system that is specific to AltText making or for the task of describing image to the Blind people.

Kaveinthran @kaveinthran

@arqeria from what I am seeing right now, @letsenvision is the only research and AT company that can under-take this stuff.
LIke it or not, at the moment, Vision language model, is just average in describing images, it is worst in describing bad images. The overrepresentation of filler words, descriptives etc, makes the technology overhyped. Until we have a blind-focused AI Eval on image description, we are only at step 1 of the development process.

Kaveinthran @kaveinthran

@arqeria @letsenvision I feel like many players are steering the users into territories that need less accountability.
like, giving AI a personality? I don't think so we need personality, we need our work to be done! We need more autonomy to steer the system to describemore granularly,, sometimes very short, sometimes longer but meaningful.
The personality aspect, while sounds like a system prompt, it just add lots of unrequired aspect to it. I guess we do not need a chat bot to begin with!

Kaveinthran @kaveinthran

@arqeria @letsenvision As you rightly pointed out, the audio and speech AI Study have progress much stronger and crisper, very much development and truly under-appriciated space, I guess it has many research and also daily use value for it.

Haily Merry @arqeria@dragonscave.space

@kaveinthran @letsenvision I feel weird for saying this, but I honestly wonder if Meta might just be our best hope here. For whatever reason, they seem to have committed themselves to a path of open source AI models at this point, and people are already doing amazing things with it. Given how quickly these things have been optimised to run on consumer hardware, I can only imagine this process will become even more efficient, or the baseline of processors will become more and more powerful as we transition away from X86 / 64. A company with the time / resources to dedicate to a project like the one you have outlined may well find itself on fertile ground very soon, if it isn’t already.

Kaveinthran @kaveinthran

@arqeria @letsenvision I am not sure if a big company may do this well, I am more open to either a small company like envision, or any research centers or even community centered approach towards this.
we first need a constitution-like principles to ground this vision AI
that is by itself is complicated. We need AI build from ground up that knows how to describe an image to a blind person just like the expert audio and image describer.

Kaveinthran @kaveinthran

@arqeria @letsenvision an small local model that is build just to describe images, and, fined-tuned with human curated data contributed by Blind, Low Vision and human describers.
the AI principles or constitutions are there to guide the fine tuning process, like be helpful, be brief, be concise, but this is just a wattered down example.
to create a constitution, we need to answer questions like, what does a good descriptions entailed? how to describe dressings? how to describe humans?

Kaveinthran @kaveinthran

@arqeria @letsenvision other questions can be more like what is important? what is les important? how to contextualise better? what are the guidelines to perceive an image from various typoligies? how to frame human visual standard?
see the compoication?
we as humans, do all this in magical second, and the copious data that we have now do not account for even little complexity!
the AI system that we have now is too general, if we need a true describer, I feel we need to work for it as community!

Kaveinthran @kaveinthran

@arqeria @letsenvision I guess all this insights is not original, I think about it more after listening to the talking description to me Podcast, shout out to @ChristineMalec

Haily Merry @arqeria@dragonscave.space

@kaveinthran @letsenvision @ChristineMalec The problem is that you need a pretty huge context size in order to do what you want to do, I’m actually not sure what the hardware requirements for large context models look like at the moment. I know they were pretty bad a year ago, but these things move quickly so I’d have to check at this point.

Haily Merry @arqeria@dragonscave.space

@kaveinthran @letsenvision I’m not saying meta would give this specifically the time of day, I’m saying that they’ve probably laid the groundwork at this point. There is however a problem, the problem being how do you make it profitable? If it’s an on device model, then how much extra can you get away with charging for the device in order to make the cost of developing the AI worthwhile? If it’s cloud based, then how do you make prices fair whilst still remaining profitable? I don’t have the answers, and doing even regular AI training is very compute intensive at this point, I dread to think what the cost of something like this would turn out to be.

Michael Doise @mikedoise@techopolis.social

@arqeria @kaveinthran @letsenvision I have so many opinions here.

Generative AI is still very much a black box even to the people who develop it. THere are theories, but we don’t exactly know everything as to how it generates all of its content. (1/4)

Michael Doise @mikedoise@techopolis.social

The human body is like a computer. We are not based on binary, but we live our lives and generate ideas based on our lived experiences, which is what generative AI does, but with a much more limited data set compared to people. (2/4)

Michael Doise @mikedoise@techopolis.social

I also don’t think we should rely on a particular group to dictate a constitution as to how image description should work through AI. Each person wants a subjective view of what is in an image. People provide subjective descriptions of images, and AI should do the same. Do you not like what one AI model states is in an image, then use another model. (3/4)

Michael Doise @mikedoise@techopolis.social

I agree that Meta is in a good position here, but there are other models as well. I really like how well LLAVA works at describing images on my local machine, so we do have loads of options here.

I do not however think that AI should be used for social media alt text though yet. I’d like that to be where we go, but I don’t think we are there yet as far as accuracy.

I could go on a lot regarding this topic, but those are just a few thoughts. (4/4)

Kaveinthran @kaveinthran

@mikedoise I agree on many of your points here. We need more local personalised models which will allow for more autonomy and agency on the users part. The subjective analysis of images from various models is also very much needed as we the human are the decision maker in deciding which descriptions fits our context and use case. The blackbox nature of the models that you have hinted is a perfect capsulation of the reason why we need decomposability in these models.

Kaveinthran @kaveinthran

@mikedoise I guess I am more of an agnostic in regards to LLM mirrors humanphysicality or vice versa. the recent book by Shannon Vallor - The AI Mirror: How to Reclaim Our Humanity in an Age of Machine Thinking is really good to give more counter factual to your thoughts.
the worrying trend for me is that the field has deviated towards marketting and many concepts and learnings of AI fields has been oversymplified. many papers in ARXIV being written like a marketing proposals.

Kaveinthran @kaveinthran

@mikedoise I don't think so we understand the physuicality and phenomonology of human experiences well to begin with,
we are an explaining and feeling creature, let say if you ask me to write a summary of an article, I can write it and collaborate with you on how to summarise it better. It's not a one time process. I can do itterative work and be wrong again and again. and I remember being wrong.

Kaveinthran @kaveinthran

@mikedoise AI, are generally build to not remember, it doesn't learn from mistakes, and it doesn't reason, works of Subbarao Kambhampati et al coroborate this notion that we need a modular system outside of a language model to do more stuff and human reasoners and decision makers should be in the loop. i do day-to-day work with AI to extract key information, extractive and abstractive summary, and also in immitating input content. What I found is.....

Kaveinthran @kaveinthran

@mikedoise we need to have techniques to steer the AI with multistep prompting for it to do good work, just by saying summarise or rewrite this, we are just getting an average work.
let say if you say "extract all key insights", it will arbitrarily extract 6 or 10 insights, but if you give it techniques to extract, say first read it all, then read it again, divide the content into sections, take scrab notes for each sections, then extract insights, critique your work, and reiterate, its better.

Kaveinthran @kaveinthran

@mikedoise even then, when we ask it to decompose a work into step-by-step process, e.g. active reading, taking notes, we should be mindful that it's not actively "doing" the work. this is always a metaphore, like maybe what is black or red for a blind person.
we are still long way to go to achieve an AI system that is more smart at really doing the work and reitterating on it.
Language is too powerful to dilute our experience, as our reality itself is highly manufactured, perceived reality.

Kaveinthran @kaveinthran

@mikedoise for some pointers on this, you can look at the work done by folk at elicit.org blog where they talks about how they build decomposability into elicit workflow. It's still work-in-progress, but the takeaway is strong, we need to decompose even simpler task to have good work from AI,

Kaveinthran @kaveinthran

@mikedoise and most startup are not decomposing their tasks. See how perplexity work? I am a search geek, I would say perplexity is average in summarising source, so bad in looking up, and quite good in establishing narrative.
this is because they din't care about how their search workflow should be, a deep search contains many steps and substeps. you.com do the work better

Kaveinthran @kaveinthran

@mikedoise I've talked about why perplexity is doing a bad search here
https://www.reddit.com/r/perplexity_ai/comments/1c7xcxb/searching_vs_information_foraging/
it's not my original insight, I learnt about search and art of
deep search from people like @researchbuzz
and I learnt the fundamentals about why perplexity is average from David Shapiro, an AI engineer.

Drag & drop to upload