Google Gemini: Everything you need to know about the new generative AI platform

Trending 3 weeks ago

Google’s trying to make waves pinch Gemini, its flagship suite of generative AI models, apps and services.

So what is Gemini? How tin you usage it? And really does it stack up to nan competition?

To make it easier to support up pinch nan latest Gemini developments, we’ve put together this useful guide, which we’ll support updated arsenic caller Gemini models, features and news astir Google’s plans for Gemini are released.

What is Gemini?

Gemini is Google’s long-promised, next-gen GenAI exemplary family, developed by Google’s AI investigation labs DeepMind and Google Research. It comes successful 3 flavors:

  • Gemini Ultra, nan astir performant Gemini model.
  • Gemini Pro, a “lite” Gemini model.
  • Gemini Nano, a smaller “distilled” exemplary that runs connected mobile devices for illustration nan Pixel 8 Pro.

All Gemini models were trained to beryllium “natively multimodal” — successful different words, capable to activity pinch and usage much than conscionable words. They were pretrained and fine-tuned connected a assortment of audio, images and videos, a ample group of codebases and matter successful different languages.

This sets Gemini isolated from models specified arsenic Google’s ain LaMDA, which was trained exclusively connected matter data. LaMDA can’t understand aliases make thing different than matter (e.g., essays, email drafts), but that isn’t nan lawsuit pinch Gemini models.

What’s nan quality betwixt nan Gemini apps and Gemini models?

Google's Bard

Image Credits: Google

Google, proving once again that it lacks a knack for branding, didn’t make it clear from nan outset that Gemini is abstracted and chopped from nan Gemini apps connected nan web and mobile (formerly Bard). The Gemini apps are simply an interface done which definite Gemini models tin beryllium accessed — deliberation of it arsenic a customer for Google’s GenAI.

Incidentally, nan Gemini apps and models are besides wholly independent from Imagen 2, Google’s text-to-image exemplary that’s disposable successful immoderate of nan company’s dev devices and environments.

What tin Gemini do?

Because nan Gemini models are multimodal, they tin successful mentation execute a scope of multimodal tasks, from transcribing reside to captioning images and videos to generating artwork. Some of these capabilities person reached nan merchandise shape yet (more connected that later), and Google’s promising each of them — and much — astatine immoderate constituent successful nan not-too-distant future.

Of course, it’s a spot difficult to return nan institution astatine its word.

Google seriously underdelivered pinch nan original Bard launch. And much precocious it ruffled feathers pinch a video purporting to show Gemini’s capabilities that turned retired to person been heavy doctored and was much aliases little aspirational.

Still, assuming Google is being much aliases little truthful pinch its claims, here’s what nan different tiers of Gemini will beryllium capable to do erstwhile they scope their afloat potential:

Gemini Ultra

Google says that Gemini Ultra — acknowledgment to its multimodality — tin beryllium utilized to thief pinch things for illustration physics homework, solving problems step-by-step connected a worksheet and pointing retired imaginable mistakes successful already filled-in answers.

Gemini Ultra tin besides beryllium applied to tasks specified arsenic identifying technological papers applicable to a peculiar problem, Google says — extracting accusation from those papers and “updating” a floor plan from 1 by generating nan formulas basal to re-create nan floor plan pinch much caller data.

Gemini Ultra technically supports image generation, arsenic alluded to earlier. But that capacity hasn’t made its measurement into nan productized type of nan exemplary yet — possibly because nan system is much analyzable than really apps specified arsenic ChatGPT make images. Rather than provender prompts to an image generator (like DALL-E 3, successful ChatGPT’s case), Gemini outputs images “natively,” without an intermediary step.

Gemini Ultra is disposable arsenic an API done Vertex AI, Google’s afloat managed AI developer platform, and AI Studio, Google’s web-based instrumentality for app and level developers. It besides powers nan Gemini apps — but not for free. Access to Gemini Ultra done what Google calls Gemini Advanced requires subscribing to nan Google One AI Premium Plan, priced astatine $20 per month.

The AI Premium Plan besides connects Gemini to your wider Google Workspace relationship — deliberation emails successful Gmail, documents successful Docs, presentations successful Sheets and Google Meet recordings. That’s useful for, say, summarizing emails aliases having Gemini seizure notes during a video call.

Gemini Pro

Google says that Gemini Pro is an betterment complete LaMDA successful its reasoning, readying and knowing capabilities.

An independent study by Carnegie Mellon and BerriAI researchers recovered that nan first type of Gemini Pro was so amended than OpenAI’s GPT-3.5 astatine handling longer and much analyzable reasoning chains. But nan study besides recovered that, for illustration each ample connection models, this type of Gemini Pro peculiarly struggled pinch mathematics problems involving respective digits, and users recovered examples of bad reasoning and evident mistakes.

Google promised remedies, though — and nan first arrived successful nan shape of Gemini 1.5 Pro.

Designed to beryllium a drop-in replacement, Gemini 1.5 Pro is improved successful a number of areas compared pinch its predecessor, possibly astir importantly successful nan magnitude of information that it tin process. Gemini 1.5 Pro tin return successful ~700,000 words, aliases ~30,000 lines of codification — 35x nan magnitude Gemini 1.0 Pro tin handle. And — nan exemplary being multimodal — it’s not constricted to text. Gemini 1.5 Pro tin analyse up to 11 hours of audio aliases an hr of video successful a assortment of different languages, albeit slow (e.g., searching for a segment successful a one-hour video takes 30 seconds to a infinitesimal of processing).

Gemini 1.5 Pro entered nationalist preview connected Vertex AI successful April.

An further endpoint, Gemini Pro Vision, tin process matter and imagery — including photos and video — and output matter on nan lines of OpenAI’s GPT-4 pinch Vision model.


Using Gemini Pro successful Vertex AI. Image Credits: Gemini

Within Vertex AI, developers tin customize Gemini Pro to circumstantial contexts and usage cases utilizing a fine-tuning aliases “grounding” process. Gemini Pro tin besides beryllium connected to external, third-party APIs to execute peculiar actions.

In AI Studio, there’s workflows for creating system chat prompts utilizing Gemini Pro. Developers person entree to some Gemini Pro and nan Gemini Pro Vision endpoints, and they tin set nan exemplary somesthesia to power nan output’s imaginative scope and supply examples to springiness reside and style instructions — and besides tune nan information settings.

Gemini Nano

Gemini Nano is simply a overmuch smaller type of nan Gemini Pro and Ultra models, and it’s businesslike capable to tally straight connected (some) phones alternatively of sending nan task to a server somewhere. So far, it powers a mates of features connected nan Pixel 8 Pro, Pixel 8 and Samsung Galaxy S24, including Summarize successful Recorder and Smart Reply successful Gboard.

The Recorder app, which lets users push a fastener to grounds and transcribe audio, includes a Gemini-powered summary of your recorded conversations, interviews, presentations and different snippets. Users get these summaries moreover if they don’t person a awesome aliases Wi-Fi relationship disposable — and successful a motion to privacy, nary information leaves their telephone successful nan process.

Gemini Nano is besides successful Gboard, Google’s keyboard app. There, it powers a characteristic called Smart Reply, which helps to propose nan adjacent point you’ll want to opportunity erstwhile having a speech successful a messaging app. The characteristic initially only useful pinch WhatsApp but will travel to much apps complete time, Google says.

And successful nan Google Messages app connected supported devices, Nano enables Magic Compose, which tin trade messages successful styles for illustration “excited,” “formal” and “lyrical.”

Is Gemini amended than OpenAI’s GPT-4?

Google has respective times touted Gemini’s superiority connected benchmarks, claiming that Gemini Ultra exceeds existent state-of-the-art results connected “30 of nan 32 wide utilized world benchmarks utilized successful ample connection exemplary investigation and development.” The institution says that Gemini 1.5 Pro, meanwhile, is much tin astatine tasks for illustration summarizing content, brainstorming and penning than Gemini Ultra successful immoderate scenarios; presumably this will alteration pinch nan merchandise of nan adjacent Ultra model.

But leaving speech nan mobility of whether benchmarks really bespeak a amended model, nan scores Google points to look to beryllium only marginally amended than OpenAI’s corresponding models. And — arsenic mentioned earlier — immoderate early impressions haven’t been great, pinch users and academics pointing retired that nan older type of Gemini Pro tends to get basal facts wrong, struggles pinch translations and gives mediocre coding suggestions.

How overmuch does Gemini cost?

Gemini 1.5 Pro is free to usage successful nan Gemini apps and, for now, AI Studio and Vertex AI.

Once Gemini 1.5 Pro exits preview successful Vertex, however, nan exemplary will costs $0.0025 per characteristic while output will costs $0.00005 per character. Vertex customers salary per 1,000 characters (about 140 to 250 words) and, successful nan lawsuit of models for illustration Gemini Pro Vision, per image ($0.0025).

Let’s presume a 500-word article contains 2,000 characters. Summarizing that article pinch Gemini 1.5 Pro would costs $5. Meanwhile, generating an article of a akin magnitude would costs $0.1.

Ultra pricing has yet to beryllium announced.

Where tin you effort Gemini?

Gemini Pro

The easiest spot to acquisition Gemini Pro is successful the Gemini apps. Pro and Ultra are answering queries successful a scope of languages.

Gemini Pro and Ultra are besides accessible successful preview successful Vertex AI via an API. The API is free to usage “within limits” for nan clip being and supports definite regions, including Europe, arsenic good arsenic features for illustration chat functionality and filtering.

Elsewhere, Gemini Pro and Ultra tin beryllium found successful AI Studio. Using nan service, developers tin iterate prompts and Gemini-based chatbots and past get API keys to usage them successful their apps — aliases export nan codification to a much afloat featured IDE.

Code Assist (formerly Duet AI for Developers), Google’s suite of AI-powered assistance devices for codification completion and generation, is utilizing Gemini models. Developers tin execute “large-scale” changes crossed codebases, for illustration updating cross-file limitations and reviewing ample chunks of code.

Google’s brought Gemini models to its dev tools for Chrome and Firebase mobile dev platform, and its database creation and guidance tools. And it’s launched caller information products underpinned by Gemini, for illustration Gemini successful Threat Intelligence, a constituent of Google’s Mandiant cybersecurity level that tin analyse ample portions of perchance malicious codification and fto users execute earthy connection searches for ongoing threats aliases indicators of compromise.