My developer life cycle includes a ton of different tasks, especially for my dasBlog open source project. I might be writing new code, editing existing lines of code as part of a review, reviewing new classes or methods from colleagues, or trying to track down exceptions from production logs. Given the variety of problem types, I am finding it important to have an assistant that matches the specific task at hand.
As of the date of this post Visual Studio supports o1, 03 mini, Claude 3.7 Sonnet, GPT-3.5 Turbo, GPT-4o, and given this incredible choice what LLM should you choose for which tasks?
Well the first answer is it depends, for inline updates GPT-3.5 Turbo and GPT-4o you select whatever you prefer by navigating to Tools –> Options –> Copilot completions in Visual Studio.
For chat, multi file updates, exception and general debugging you can select between o1, 03 mini, Claude 3.7 Sonnet. Even with the right LLM selected I thought it might be useful to talk about some of known strengths of these LLM choices and see how that might impact my work, here are my high level observations.
TL;DR
For more complex reasoning, I tend to turn to o1, which excels at problem-solving but can be slower for simpler tasks. If I need a quicker, balanced option, o3-mini’s speed and accuracy are great for my OS project. Claude 3.7 Sonnet is my go-to for large-scale tasks (I am currently testing a multi file refactor and it is going well), it is super fast for all development, debugging, and refactoring jobs.
Picking the right model really depends on the complexity of the task and also whether you value speed and agility over pure accuracy. My personal, non-scientific opinion is that in general, as of this day, we should all be using Claude Sonnet, it is awesome!
GPT-4o (released May 2024)
GPT‑4o (“o” for “omni”) was considered the flagship for Open AI model (back in the day) and is designed to reason across audio, vision, and text in real time (average of 320ms).
- Strengths: Known for its advanced natural language against large code bases, GPT-4o excels in generating accurate code snippets, debugging, and providing detailed technical explanations.
- Use Cases: Ideal for extended chat and debugging sessions, as well as inline code completions, and can further tackle tasks we associate with conversational agents.
- Limitations: While powerful, it may not be as cost-efficient as smaller models like o3-mini.
GPT 3.5 Turbo (released August 2023)
GPT-3.5 Turbo is an efficient and scalable model known for its strong contextual understanding. It is often appreciated for its cost-effectiveness and versatility and thus makes the most sense for small repetitive code tasks.
- Strengths: Efficient and quick model, with strong contextual understanding and scalable fine-tuning capabilities.
- Use Cases: Suitable for real-time applications, and for Visual Studio and is a near perfect option for inline code completions.
- Limitation: Lacks multimodal functionality, which newer models tend to support.
o1 (released September 2024)
o1 represented a new series of AI models designed to spend more time thinking before they respond. They can reason through complex tasks and solve harder software coding and math problems.
- Strengths: Designed for complex reasoning tasks, o1 introduces chain-of-thought reasoning, breaking down problems into smaller components. It supports structured outputs and function calling, making it suitable for sophisticated applications.
- Use Cases: Excellent for tasks requiring step-by-step problem-solving.
- Limitations: Higher latency compared to o3-mini even for simpler tasks.
o3-mini (released December 2024)
o3‑mini uses medium reasoning effort to provide a balanced trade-off between speed and accuracy.
- Strengths: A cost-effective reasoning model optimized for STEM tasks like coding, math, and science. It offers faster responses and supports developer features like structured outputs and function calling.
- Use Cases: Perfect for technical domains requiring precision and speed, such as algorithm development and logical problem-solving.
- Limitations: Does not support vision capabilities, unlike o1.
Claude 3.7 Sonnet (released February 2025)
Built for agentic coding, it can complete tasks across the entire software development lifecycle. I would certainly consider using this bug fixes as well as maintaining and refactoring larger code bases. I think this is my preferred choice for complex end-to-end software development activities.
- Strengths: A hybrid reasoning model that combines rapid responses with extended thinking. It excels in coding, logic, and mathematical reasoning, and supports agentic task execution.
- Use Cases: Suitable for end-to-end software development processes, including planning, debugging, and really large refactor tasks. It can give you a really helpful step by step approach to solving a broad multi file problem.
- Limitations:
Version 17.14.0 Preview 2.0
In a blank version of VS with no code, this feature is not in the Options menu. It is in the CoPilot tab in the Ask CoPilot or use @workspace prompt window. The choices are:
GPT-4.0
o1 (Preview)
o3-mini (Preview)
Claude 3.7 Sonnet (Preview)
Thanks for the feedback, given that you are using the preview, can you confirm that you have the "Unified Settings Experience" enabled, which is itself a preview feature in preview. I think this is the only way the option for "inline" can be changed.
Mark