DOGE is using AI the wrong way

March 31, 2025

President Trump and Elon Musk’s DOGE initiative is misapplying artificial intelligence and missing its full potential.

The first round of mass layoffs of government employees has resulted in a public backlash. In response, the president said the next phase of the DOGE initiative will be more precise: “I don’t want to see a big cut where a lot of good people are cut.”

So what would a better approach to leveraging AI in a government landscape — the scalpel instead of the chainsaw — look like? Let’s start by examining the current use of AI and why it is missing the mark.

So far, Musk’s approach is to use AI on the responses of government employees, which have trickled in as a result of his controversial email to federal workers in February asking for the major bullet points summarizing what they’re working on. AI, based on tailoring a Large Language Model, uses this data to determine whether employees are essential or not.

In theory, it’s a clever idea: Train an AI model using a reliable dataset to determine and validate which kinds of jobs are unnecessary or ripe for automation. Based on its training data, the AI might have determined, for example, that if a job follows well-defined rules with no interaction or coordination, that job could perhaps be performed more effectively by an algorithm.

But such an approach risks making serious errors due to the biases embedded in LLMs. What might these biases be? It’s hard to tell, and therein lies the problem.

No one fully understands the inner workings of LLMs and how they come up with their answers. The false negatives from applying such an algorithm — firing essential employees — can have an unusually high cost of error. Most organizations have low-key employees who are often the go-to people for institutional knowledge or turbocharging productivity across an organization.

How would a LLM rate a response such as, “I answer peoples’ questions”? It’s hard to know, but I wouldn’t trust the machine. I’d want more information from the respondent’s bosses and colleagues. And so I find myself back to using humans, not AI, to determine the value of workers.

So while the current approach is simple and has the advantage of not requiring much data, it is likely to make serious errors.

Instead, I would strongly urge Musk’s new “scalpel” to leverage more objective data from government agencies. In this way, the AI would first “make sense” of the various agencies. For example, what is the mission of USAID, its objectives, how they are measured and how effective has the agency been in achieving the objectives? How about the Pentagon?

Imagine, for example, if we gave an LLM access to the Pentagon’s history, including all its contracts and projects to date, anonymized communications related to them and all outcomes. This type of data could “fine-tune” the LLM. Now, imagine priming such a fine-tuned AI with the following type of prompt: “Given the mission of the Pentagon and its current goals, objectives and budget, identify the areas of biggest risk, potential of failure and the impact on the budget.”

The current F-35 Lightning II Joint Strike Fighter Program, for example, has an estimated budget of roughly $2 trillion. The Columbia-class submarine program has a budget approaching $500 billion. These are big-ticket items. Could an AI make sense of them?

Indeed, evaluating programs within agencies is well within the scope of modern AI. Such an approach would require conducting a critical evaluation of the fine-tuned system on carefully constructed test cases where “the truth” is known. For example, historical use cases to train the AI might be the B-52 bomber, the Trident submarine, the Minuteman missile and other kinds of programs that include defensive and offensive weapons. Such cases could be used to build a model that is able to predict what types of current and envisioned projects, such as the F-35, are likely to have the highest failure rates or delays, and the reasons for them.

The technical challenges for creating such an AI tool in a federal context are not trivial and involve privacy and security issues, but they are not insurmountable. The important questions are who should have the authority to fine-tune the AI to answer such questions, who should have the authority to issue the query and what could they do with the answers?

In a corporation, the CEO would clearly have the authority to do both, and have wide latitude in how to act on the answers. Indeed, a big part of a CEO’s job is to ensure that business resources are deployed in the long-term interest of its shareholders.

Is it any different for the U.S. government? Should the president have the authority to fine-tune the AI on specific government data, and ask such a question and take possible actions based on the answers?

In my mind, the answer to the first question is clearly yes, although it would likely require a committee on national security, which would need to follow a well-defined process for coming up with the right training data for the AI and a well-defined method for evaluating its responses.

The second question is more complicated, and depends on the specific actions being taken. Democratically elected governments must follow due process, for which the AI could also advise the committee and the president on possible legal courses of action and their consequences.

Now is the moment for Musk to recalibrate, harnessing AI’s full potential in even more transformative ways. With strategic guidance and strong congressional oversight, America can wield this AI scalpel to revolutionize and supercharge the federal government.

Vasant Dhar is a professor at the Stern School of Business and the Center for Data Science at NYU. An artificial intelligence researcher and data scientist, he hosts the podcast “Brave New World,” which explores how technology and virtualization in the post-COVID era is transforming humanity.

Source link