P&IDExtraction

Scanned P&IDs vs CAD Exports: What to Expect

April 9, 20265 min read

Most P&ID extraction happens on brownfield projects. The drawings already exist — the question is what condition they're in.

A clean CAD export and a 20-year-old scan of a hand-drafted drawing are both "PDFs," but they behave completely differently when you're trying to pull instrument data out of them. Here's what actually matters.

Vector PDFs vs raster scans

The single biggest factor in extraction quality is whether your PDF contains vector graphics or raster images.

	Vector (CAD export)	Raster (scanned)
Source	Exported directly from AutoCAD, MicroStation, SmartPlant	Flatbed scanner, photo, or print-to-PDF of a photocopy
Text	Selectable, searchable	Embedded in image pixels
Zoom behavior	Stays crisp at any zoom	Gets blurry when zoomed
Tag readability	Characters are exact	Subject to scan artifacts, smudging, fading
Typical file size	200KB - 2MB per page	5MB - 50MB per page
Extraction accuracy	High — text is unambiguous	Varies — depends on scan quality

Quick test: open your PDF and try to select text with your cursor. If you can highlight individual tag numbers, it's vector. If you can only select the whole page as an image, it's a scan.

What degrades scanned drawings

Not all scans are equal. These are the specific issues that make instrument tags harder to read:

Resolution. Anything below 200 DPI starts losing fine detail in tag text. ISA bubbles are small — at 150 DPI, the difference between "FIT" and "FLT" or between "101" and "1O1" gets ambiguous. 300 DPI is the practical standard for readable scans.

Contrast. Faded blueprints, yellowed paper, or low-contrast photocopies reduce the distinction between text and background. The classic failure mode is a light pencil annotation on a blue background — nearly invisible in a scan.

Skew and rotation. Drawings fed through a sheet scanner at a slight angle produce rotated text. A 2-3 degree skew is common and usually manageable. Beyond 5 degrees, tag text starts getting misread.

Compression artifacts. Some document management systems re-compress PDFs for storage. Each compression cycle degrades image quality. If a drawing has been scanned, uploaded to a DMS, downloaded, emailed, and re-uploaded — it may have been compressed three or four times.

Annotations and markups. Red-line markups, cloud revisions, and sticky notes layered on top of instrument bubbles can obscure tag numbers. Hand-written corrections next to printed tags create ambiguity about which value is current.

What actually helps

If you have control over how drawings are prepared before extraction:

Re-export from CAD if possible. If the original DWG/DGN files still exist, a fresh PDF export will always produce better results than any scan. Even if the drawings are old, the vector data is still clean.

Scan at 300 DPI minimum. If you must scan, 300 DPI grayscale or black-and-white produces the best balance of quality and file size. Color scans are larger but don't improve text readability.

Use black-and-white mode for line drawings. Color scanning picks up paper yellowing, coffee stains, and background noise. B&W thresholding cleans all of that out and produces sharper text edges.

Flatten markups before scanning. If the drawing has red-line revisions, either accept/flatten them in the CAD source or scan without the markup overlay. Mixed layers confuse extraction.

Don't re-compress. Save scans as PDF with no additional JPEG compression. If your scanner software has a "quality" slider, set it to maximum.

Mixed drawing sets

Real projects rarely have uniform drawing quality. A typical brownfield set might include:

30 pages of clean CAD exports from a recent turnaround
15 pages scanned from the original 1990s construction package
5 pages that are photos of laminated control room copies

Each page type will produce different extraction quality. The key is knowing which pages need more attention during review — and that comes down to confidence scoring.

Pages with clear vector text produce high-confidence extractions. Pages with degraded scans produce lower confidence scores, which flag them for closer review. This is expected behavior, not a failure — it's the extraction telling you where to focus your time.

The 80/20 of brownfield extraction

On a typical mixed-quality drawing set:

60-70% of instruments extract cleanly with high confidence from the better-quality pages
20-25% extract with medium confidence — correct tag number, may need signal class verification
5-15% need manual review — degraded source, ambiguous characters, or unusual tag formats

The value isn't eliminating review entirely. It's reducing a multi-day manual effort to a focused review session on the flagged items.

Format recommendations by source

Drawing source	Recommended preparation	Expected quality
Current CAD system (AutoCAD, MicroStation)	Export as vector PDF, no rasterization	Excellent
SmartPlant P&ID / AVEVA	Native PDF export	Excellent
Bluebeam project	Export or print to PDF	Excellent
Recent scan (< 5 years, 300 DPI)	Use as-is	Good
Old scan (> 10 years, unknown DPI)	Re-scan at 300 DPI if originals available	Fair to Good
Microfilm or aperture card scan	Re-scan at highest available DPI, B&W mode	Fair
Photo of printed drawing	Crop, straighten, increase contrast	Poor to Fair

Bottom line

Vector PDFs from CAD produce the best results. Clean 300 DPI scans work well. Everything below that still works but needs more review time.

If you're starting a brownfield project and have a choice about how drawings get prepared — push for CAD re-exports. The time spent getting clean source files pays back directly in less review and fewer errors downstream.

Ready to automate your I/O list extraction?

Upload a P&ID and get a structured I/O list in minutes. 5 free pages included.

Try Tagsight Free

Back to blog