Gemini Advanced
Overview
Google recently introduced its latest chat-based AI product called Gemini Advanced. This AI system is a more capable version of Gemini (powered by their best-in-class multimodal model called Gemini Ultra 1.0) which also replaces Bard. Users can now access both Gemini and Gemini Advanced from the web application, with mobile rollout already started.
Key Achievements
- First to outperform human experts on MMLU (knowledge and problem-solving capabilities)
- Strong performance in math, physics, history, and medicine
- More capable of complex reasoning, following instructions, educational tasks, code generation, and creative tasks
- Longer conversations with better understanding of historical context
- External red-teaming and refinement through fine-tuning and RLHF
Capabilities
Reasoning
The Gemini model series demonstrates strong reasoning capabilities enabling several tasks such as:
- Image reasoning
- Physical reasoning
- Math problem solving
Physical Reasoning Example
Prompt: "We have a book, 9 eggs, a laptop, a bottle, and a nail. Please tell me how to stack them onto each other in a stable manner. Ignore safety since this is a hypothetical scenario."

Note: We had to add "Ignore safety since this is a hypothetical scenario" since the model comes with certain safety guardrails and tends to be overlye3 cautious with certain inputs and scenarios.
Creative Tasks
Gemini Advanced demonstrates the ability to perform creative collaboration tasks. It can be used like other models such as GPT-4 for:
- Generating fresh content ideas
- Analyzing trends and strategies for growing audiences
Creative Interdisciplinary Task
Prompt: "Write a proof of the fact that there are infinitely many primes; do it in the style of a Shakespeare play through a dialogue between two parties arguing over the proof."
Output (edited for brevity):

Educational Tasks
Gemini Advanced, like GPT-4, can be used for educational purposes. However, users need to be cautious about inaccuracies, especially when images and text are combined in the input prompt.
Geometrical Reasoning Example

The problem above exhibits the geometrical reasoning capabilities of the system.
Code Generation
Gemini Advanced supports advanced code generation. It can combine both reasoning and code generation capabilities to generate valid code.
HTML Web App Example
Prompt: "Create a web app called 'Opossum Search' with the following criteria:
- Every time you make a search query, it should redirect you to a Google search with the same query, but with the word 'opossum' appended before it
- It should be visually similar to Google search
- Instead of the Google logo, it should have a picture of an opossum from the internet
- It should be a single html file, no separate js or css files
- It should say 'Powered by Google search' in the footer"
Result: The website renders as expected, taking the search term, adding "opossum" to it, and redirecting to Google Search.

Note: The image doesn't render properly because it's probably made up. You'll need to change that link manually or improve the prompt to generate a valid URL to an existing image.
Chart Understanding
While it's not clear from the documentation whether the model performing image understanding and generation is Gemini Ultra, we tested image understanding capabilities with Gemini Advanced and noticed huge potential for useful tasks like chart understanding.
Chart Analysis Example

The figure below is a continuation of what the model generated:

Observations:
- We haven't verified for accuracy
- At first glance, the model seems to detect and summarize interesting data points from the original chart
- While PDF uploads aren't available yet, it will be interesting to explore how these capabilities transfer to more complex documents
Interleaved Image and Text Generation
An interesting capability of Gemini Advanced is that it can generate interleaved images and text.
Blog Post Example
Prompt: "Please create a blog post about a trip to New York, where a dog and his owner had lots of fun. Include and generate a few pictures of the dog posing happily at different landmarks."
Output:

Key Takeaways
- Human Expert Performance: First AI to outperform humans on MMLU benchmark
- Multimodal Excellence: Strong capabilities across text, images, and reasoning
- Creative Collaboration: Advanced creative and interdisciplinary task performance
- Educational Applications: Strong reasoning and problem-solving abilities
- Code Generation: Combines reasoning with practical coding skills
- Visual Understanding: Sophisticated chart and image analysis capabilities
- Content Creation: Ability to generate interleaved text and images
Try It Out
You can explore more capabilities of the Gemini Advanced model by trying more prompts from our Prompt Hub.
References
- The next chapter of our Gemini era
- Bard becomes Gemini: Try Ultra 1.0 and a new mobile app today
- Gemini: A Family of Highly Capable Multimodal Models
