We Built a SaaS Product Using Claude API. Here Is What We Learned.

TakeoffIQ started as a prompt. Not metaphorically. It started as a prompt I typed into Claude to see if the model could read a construction blueprint and identify material quantities.

It could. Not perfectly. But well enough to see that something real was possible here.

What followed was several months of building a product that actually works in the construction industry. This is the honest version of what that process looked like. It is one of several apps we have built on the Claude API.

The Problem We Were Solving

Manual material takeoff is one of the most time-consuming tasks in construction estimation. A quantity surveyor or estimator takes a set of architectural drawings, reads through every page, and manually counts or measures every item: metres of framing, sheets of plasterboard, number of fixtures, square metres of concrete. For a medium-sized commercial project, this takes days.

Days that cost real money. Days where errors compound. A miscounted item in the takeoff becomes an incorrect order. An incorrect order becomes a cost blow-out or a site delay.

The appeal of automating this is obvious. The hard part is that construction drawings are not standardised. They come as PDFs, sometimes scanned from paper, sometimes exported from CAD software, sometimes a mix of both. They have title blocks, revision clouds, general notes, schedules, and detail drawings. Reading them requires understanding what you are looking at, not just parsing text.

Claude Vision API can actually read these. That was the starting point.

How the Claude Vision API Integration Works

The core of TakeoffIQ is a pipeline that takes a PDF, extracts pages as images, and sends them to Claude with a structured prompt that asks it to identify and quantify specific elements.

The prompt is a system prompt that establishes the context, the role (you are an experienced quantity surveyor reviewing construction drawings), the specific elements to identify, the format of the output, and the constraints (do not estimate, only report what is visible and measurable from this drawing).

The output is JSON. Structured, predictable JSON with specific fields for item type, quantity, unit of measurement, drawing reference, and confidence level. The confidence level was a decision we made after early testing: Claude would sometimes report a quantity it was uncertain about with the same tone as one it was certain about. Making the model report its own confidence and filtering low-confidence items for human review significantly improved overall accuracy.

Here is an example of what a schema field looks like:

{
  "item": "structural steel column",
  "quantity": 12,
  "unit": "each",
  "drawing_reference": "S-04",
  "confidence": "high",
  "notes": "4 columns per grid line A, B, C"
}

Getting to a schema Claude followed reliably required about 3 weeks of iteration. The model is capable. The constraint is in the prompt engineering.

What Broke During Development

Three things broke in ways we did not anticipate.

First: page ordering. When a PDF is split into images and sent to Claude, the model does not inherently know the relationship between pages. A detail drawing on page 12 references an element shown on page 3. If you send pages individually, Claude cannot make that connection. We solved this by sending related pages together, grouped by drawing type, which required building a page classification step before the extraction step.

Second: unit inconsistencies in drawings. Construction drawings are often created by different people at different times. One drafter uses millimetres, another uses metres. Claude would extract quantities accurately from the drawing but the units would be inconsistent across items. We built a normalisation step that standardises all quantities to a single unit system after extraction.

Third: handwritten annotations. Some drawings have handwritten revision notes that change a quantity or specification. Claude Vision reads these reasonably well but less reliably than printed text. For drawings known to have significant handwritten annotation, we now flag them for manual review rather than treating the AI output as definitive.

What Surprised Us

The reasoning quality was better than expected. We expected Claude to extract visible numbers. It also inferred quantities from patterns: if a floor plan shows a repeating column grid, Claude counts the grid intersections and reports the total column count rather than requiring every column to be individually labelled.

This is genuinely useful and matches how an experienced estimator reads a drawing. It was not something we explicitly prompted for. It emerged from establishing the right role context in the system prompt.

The other surprise was how well the model handled ambiguous drawings. Old or poorly drafted drawings that would frustrate a junior estimator were handled by Claude with appropriate flagging: it would note what it could see, report what it could extract, and explicitly identify what required further clarification. That behaviour, treating ambiguity as something to flag rather than something to guess through, is exactly what you want from a tool that feeds into construction cost estimates.

Where TakeoffIQ Is Now

TakeoffIQ is a working product that is being used on real construction projects. It is not replacing estimators: it is giving them a first pass on the material schedule that they then review and verify. The time saving is significant, typically reducing the initial takeoff from days to hours.

The accuracy on commercial drawings with clean, machine-generated PDFs is high enough to be commercially viable. Accuracy on scanned legacy drawings is lower and these are flagged for closer review.

We are continuing to improve the page relationship logic and adding support for specification documents, which sit alongside drawings and contain material specifications that affect quantities.

If you are curious about what building a real product on the Claude API looks like beyond the demos and proof-of-concept stage, this is it. The work is in the pipeline design, not in the prompting. The model is capable. The engineering is in making it reliable. For another example of a Claude-powered product we built, see building an AI in-car co-pilot.

What We Would Tell Anyone Starting This

Start with structured output from day one. Decide on your JSON schema before you write your first production prompt. Changing the schema later means rewriting extraction logic and re-validating your outputs.

Build a confidence signal into every extraction. The model does not know what it does not know unless you ask it to report uncertainty explicitly.

Plan for the edge cases in your source data before you plan for the happy path. The happy path will work. The edge cases are what determine whether the product is actually usable.

And use the system prompt for role and context, not just for instructions. Telling Claude who it is and what expertise it has changes the quality of output in ways that adding more instructions to a user prompt does not fully replicate. If you want to explore building something like this, get in touch.