The biggest / funniest Google disconnect that I know about - I just ran across a site that uses the Google Vision API on any still photo you submit, which uses image recognition and/or Gemini (the API has a zillion options and it's hard to determine which of the zillions the site in question is using) to determine emotion, setting, context, likely income, likely politics, and best marketing products and angles.
Me and some friends tried it, it was suprisingly good and accurate.
The thing I found most interesting about it was in a decade plus of using Google products, I've basically never seen a relevant ad.
So more than ten years of emails, documents, video meetings, spreadsheets, social graph analysis, and whatever else got them nothing, but running a single still photo through Vision / Gemini single-shot a much better segmentation?? Crazy.
But if you're interested in trying too, here's the link. No idea what they do with your photo and the inferred data, I assume they collect it in a database for marketing or something even more nefarious.
Right, and not just left to collect dust, but it's strictly better *at their core competency,* by some absurd factor (5x - 100x), and is hidden away in some tiny corner of "API services" space not being used by anyone.
part of the issue for a while was how bespoke and finnicky all their APIs were with vertex/gemini, so they didnt get basically ANY developer adopters that touted them. but gemini 2 has been great for me as it is more than smart enough to handle the agentic system im building and the choices it has to make to accommodate user requests and its cheap enough that i can use it without really thinking as much about the conversation length like with claude
Yes! I have a premium model, but largely use it for images -- almost all my images from my substack over the last few months have been Gemini-created. The OpenAI image models are bizarre and overly dramatic. Interestingly, in an einvlorment when humanists like me and @HenryOliver are having conversations about AI taste, there isn't yet enough attention paid to the visual capabilities. I can usually tell which model created which image. Yes it took me many back and forths to get the image of Sisyphus working on his laptop sitting on his rock; yes it took equally many to get the image of a chip covering the sun in an eclipse from the perspective of a university, but they are both excellent images and worth the time spent.
Excellent points. I also found it interesting that Gemini is never at the top of benchmarks like OpenAI and Anthropic’s models are. I do wonder if it’s because google’s customer base is so large that a good model integrated into their system is a higher priority than having the “best” model on the market.
Gemini kept coming on top occasionally but gets shrugged off and then a couple months pass, and is forgotten. The ones which have broken through are notebooklm and flash cost efficacy...
The biggest / funniest Google disconnect that I know about - I just ran across a site that uses the Google Vision API on any still photo you submit, which uses image recognition and/or Gemini (the API has a zillion options and it's hard to determine which of the zillions the site in question is using) to determine emotion, setting, context, likely income, likely politics, and best marketing products and angles.
Me and some friends tried it, it was suprisingly good and accurate.
The thing I found most interesting about it was in a decade plus of using Google products, I've basically never seen a relevant ad.
So more than ten years of emails, documents, video meetings, spreadsheets, social graph analysis, and whatever else got them nothing, but running a single still photo through Vision / Gemini single-shot a much better segmentation?? Crazy.
But if you're interested in trying too, here's the link. No idea what they do with your photo and the inferred data, I assume they collect it in a database for marketing or something even more nefarious.
https://theyseeyourphotos.com/
Great case in point. A perfectly lovely little product, flashes of genius, left to collect dust and inevitably get sunset at some point ...
Right, and not just left to collect dust, but it's strictly better *at their core competency,* by some absurd factor (5x - 100x), and is hidden away in some tiny corner of "API services" space not being used by anyone.
That's just "Google in a nutshell," to me.
Bureaucracy is a way bigger brake on LLM development than most people realize. Particularly when it’s political.
I did chuckle to myself when I wrote big bureaucracy smell
part of the issue for a while was how bespoke and finnicky all their APIs were with vertex/gemini, so they didnt get basically ANY developer adopters that touted them. but gemini 2 has been great for me as it is more than smart enough to handle the agentic system im building and the choices it has to make to accommodate user requests and its cheap enough that i can use it without really thinking as much about the conversation length like with claude
They're good models!
Yes! I have a premium model, but largely use it for images -- almost all my images from my substack over the last few months have been Gemini-created. The OpenAI image models are bizarre and overly dramatic. Interestingly, in an einvlorment when humanists like me and @HenryOliver are having conversations about AI taste, there isn't yet enough attention paid to the visual capabilities. I can usually tell which model created which image. Yes it took me many back and forths to get the image of Sisyphus working on his laptop sitting on his rock; yes it took equally many to get the image of a chip covering the sun in an eclipse from the perspective of a university, but they are both excellent images and worth the time spent.
Yeah imagen is very good. I had a preview iirc. Just, again, doesn't nearly get the fanfare. I think even I forgot it existed!
If/when Gemini can put together a whole product from beginning to end, you're getting close to not even needing that product, you just talk to Gemini.
Maybe. But I can chat to anthropic and get artifacts..I can chat with GPT and use code interpreter and images. Just make it work together!
yes, that's the way it is for betterment one hopes
Excellent points. I also found it interesting that Gemini is never at the top of benchmarks like OpenAI and Anthropic’s models are. I do wonder if it’s because google’s customer base is so large that a good model integrated into their system is a higher priority than having the “best” model on the market.
Gemini kept coming on top occasionally but gets shrugged off and then a couple months pass, and is forgotten. The ones which have broken through are notebooklm and flash cost efficacy...