Tag: AI

  • I see a lot of grandiose statements floating around:

    You will not be replaced by AI, you’ll be replaced by someone using AI.

    AI won’t replace your creativity.

    AI can’t do THIS or THAT.

    AI will never be able to do THIS or THAT.

    And many more.

    My take: maybe. Who knows! LLMs are getting better at an alarming rate and they behave in unexpected ways. They make the simplest mistakes but at the same time, they have solved some of the hardest math problems. This is well know as the jagged edge. The future is extremely hard to predict and every day it’s getting harder.

    But here’s what AI can’t do now and probably won’t for a while: have true agency.

    ChatGPT and Claude do nothing until someone tell them what to do. Same for all LLMs. Even the ones that seem to be always up, like OpenClaw, are just LLMs triggered on a timer. Without the timer, nothing happens.

    In concrete terms, no LLM is waking up one day and deciding to start a company. Some people have experimented with putting an LLM in the CEO seat, but those experiments would not have happened without a human applying their agency first. A human starting the company and prompting an LLM to make the decisions. The agency there belonged to the human, not the LLM.

    So the question is: what do you do when nobody tells you what to do?

    I think people tend to divide into two categories here. Those that wait and those that find something to do. Those that wait may be safe for a while when their specific skills don’t transfer well to an LLM yet. But given enough time, AI will likely acquire all the skills. And then their jobs are at risk.

    Those that find something to do are irreplaceable.

    One expression of that is deciding to start a company. But it shows up everywhere. Deciding to paint a picture or write a poem. Deciding what they’ll be about, what their aesthetics will be. Those decisions will continue to be irreplaceable.

    Well… they’ll be the last thing to be replaced. Because you can only replace them with an entity that is always active (LLMs wake up on a prompt) and that wants things. The moment we have AI that is always active and that wants things, we have bigger problems than our jobs. We’ve created a new species that will eventually be better at everything than us. Hopefully it’ll be friendly.

    The purpose of this post is not to dream or be scared about that future. It’s to convey the fact that deciding to do something with no inputs is the final frontier.

    If you want your job to be safe, find a way for it to not require inputs. If you need a ticket to write code, you are at risk of being replaced by an LLM that can write the code from the ticket. If instead you write the ticket then you are much harder to replace.

    This is why the profession of PM, the Product Manager, is flourishing in the era of AI. Of all the roles at a company is the one that is the most open, the one that has no or very few inputs (finding and deciding what inputs to use is part of the job). All building jobs will be a lot more like a PM in the future (or will have been replaced by an LLM).

    Having that agency to do something when nobody asks sets you apart.

  • SaaS valuations dropped. The narrative is simple: AI lets anyone build anything, so why pay for software? They’re calling it the SaaSpocalypse.

    I think part of it makes sense. But a big part of it doesn’t.

    You don’t buy code, you buy decisions

    When you buy a piece of software you’re not just buying the code. You’re buying the decisions that shaped that code. What to build, what not to build, how to model the domain, which tradeoffs to make. Those decisions were valuable before AI and they are still valuable today.

    This is why I think we’re entering the era of product managers. They are the ones making those decisions. And those decisions require deep domain knowledge. Gaining that domain knowledge is not trivial.

    The Fusion test

    Let me make this concrete. I do 3D design for 3D printing. It’s a hobby, not my life’s work. I use Autodesk Fusion.

    Should I vibe code my own 3D design tool instead?

    Vibe coding something equivalent would cost far more in my time and token spend than the Fusion subscription. But cost isn’t even the main problem. The main problem is that I don’t know the domain.

    Fusion has a specific set of tools for creating 3D designs. It starts with parameterized 2D sketches, from which you build parameterized 3D shapes. Choosing the right set of primitives to manipulate 3D geometry inside a computer — flexible enough to be useful, constrained enough to be learnable — is not trivial. It probably took the industry years of trial and error to get here. I remember the primitive AutoCAD of the 90s. We came a long way. I would have to replicate that trial and error, and that is not cheap.

    Technically, I could ask an AI to clone Fusion. But that is only possible because Fusion exists. You could argue that’s fine, it exists today. But next year, Fusion will have evolved. Without Fusion pushing the domain forward, there is nothing to clone.

    And no matter how good I am at building Fusion with AI, the people at Autodesk would be better at it. They have the same AI tools, but more domain knowledge.

    This generalizes

    Pick any domain and you’ll find the same thing. I wouldn’t want to vibe code my accountancy software, or my email client, or my calendar. Don’t believe me? Think about how you would represent recurring calendar entries where each recurrence can be individually rescheduled. This is not a trivial problem.

    For some tools, the domain is trivial, and the code was the moat. That moat is gone. The drop in valuation for those companies makes sense. But when the moat is decisions, taste, or domain knowledge, it’s still there.

    The real SaaSpocalypse

    The real SaaSpocalypse is not customers vibe coding their own tools instead of buying them. It’s the new startup that vibe codes a competitor.

    The real danger to Autodesk is not me building my own Fusion. It’s someone who dedicates their life to 3D design building a competitor with AI.

    That newcomer would have an AI-friendly codebase from day one. If they invest in keeping it that way, they would add features much faster than Autodesk can on top of decades of tech debt.

    The real SaaSpocalypse is that you can now catch up to and surpass incumbents with less effort, fewer people, less investment. And the incumbents, to fight that off, will need to rebuild their internals to be AI-friendly. At some point — as crazy as it sounds — it might be faster to start from scratch than to evolve an old, tech-debt-laden codebase.

    Buy vs. build still applies

    At work we recently needed a tool. It was going to cost us tens of thousands of dollars. An engineer proposed we vibe code it instead. I seriously considered it. We understand this domain much better than I understand 3D design, and for internal tools you can move fast when security constraints are lighter.

    But it would have taken us 2 to 3 weeks. And during those weeks we would not have been developing our core product. Our unique differentiator is our own product, not a tool we can buy and use off the shelf. Our competitors can buy that same tool and have it running in hours. We shouldn’t spend weeks to get there. Focusing on our core product was the right call.

  • I’m sure this is not my idea, so I’m not claiming it to be. I’ve been wanting to do a sort of continuous AI eval in production for a while, but the situation never presented a work. It was a mixture of having the data to do the eval off line, and wanting to avoid the risks of doing it in prod. But now I’m going to do it for a side project.

    I don’t want to reveal what my side project is yet, so I’ll keep it vague. I’m very excited about this part, so I wanted to share it early. And I’m hoping that the Internet will tell me if, as it usually does, if this is a bad idea.

    I have a task that will be done by an AI and I can measure how successful it was done but only 2 to 7 days after the task was completed and seeing it out there, in the world. I will gather some successful examples to use as part of the prompt, but I don’t have a good way to measure the AIs output other than my personal vibes which is not good enough.

    My plan is to use OpenRouter and use most models in parallel, each doing a portion of the tasks (there are a lot of instances of these tasks). So if I go with 10 models, each model would be doing 10% of the tasks.

    After a while I’m going to calculate the score of each model and then assign the proportion of tasks according to that score. So the better scoring models will take most of the tasks. I’m going to let the system operate like that for a period of time and recalculate scores.

    After I see it become stable, I’m going to make it continuous, so that day by day (hour by hour?), the models are selected according to their performance.

    Why not just select the winning model? This task I’m performing benefits from diversity, so if there are two or more models maxing it out, I want to distribute the tasks.

    But also, I want to add, maybe even automatically, new models as they are released. I don’t want to have to come back to re-do an eval. The continuous eval should keep me on top of new releases. This will mean a fixed percentage for models with no wins.

    What about prompts? I will also do the same with prompts. Having a diversity of prompts is also useful, but having high performing prompts is the priority. This will allow me to throw prompts on the arena and see them perform. My ideal would be all prompts in all models. I think here I will have to watch out for the amount of combinations making it take too long to get statistically significant data about each combination’s score.

    What about cost? Good question! I’m still not sure if cost affects the score, as a sort of multiplier, or whether there’s a cut-off cost and if a model exceeds it, it just gets disqualified. At the moment, since I’m in the can-AI-even-do-this phase, I’m going to ignore cost.

  • Bring your own keys into an affiliate relationship

    I’m starting to observe a problem where a lot of LLM-enhanced apps are starting to pop up. For coding you have Cursor, but now there’s also a terminal called Warp and it costs $15/month. For individuals, consultants, small and even medium sized companies, this isn’t a workable pricing model. All apps were already turning into subscriptions and the cost of LLMs is accelerating that.

    What compounds the problem is that, because everyone feels uncomfortable with the potential surprise high bill of pay-for-what-you-use, many of these apps are charging a single monthly fee. A simple single flat fee, except that the cost of the LLMs is not flat. It grows linear with usage. This means having to throttle the LLM usage to stay within the margin of the flat fee and have a chance at a profit. That means using cheaper models, which yield worse result on average.

    I think this is why when I compare Cursor to Claude Code I find Claude Code to be better: I’m giving Anthropic a lot more money than Cursor. But also, I’m happy with that, because I can use Claude in many other ways, where Cursor is a single-use application.

    I think from now on, for each LLM powered application, I either want to be able to put my own keys in, or have pay-as-you-go with a lot of transparency. When I need it to work, I want it to work well. When I’m not using it, I want it to cost nothing.

    There’s another solution though. LLM providers could have affiliate systems where other companies get a commission for the token usage they generate. Using Warp as an example, Warp wouldn’t ask me for $15/month. Warp would ask me for my Claude API key. Warp would identify all those requests as caused by Warp, and then Anthropic would pay Warp a proportional fee for the usage generated.

    This is a win-win-win: Anthropic gets more token usage (more customer expansion), Warp gets more customers (I’m not paying $15/month, but I would plug in my key), and the user gets to have another LLM tool that otherwise they would not.