Scanned P&IDs vs CAD Exports: What to Expect
Most P&ID extraction happens on brownfield projects. The drawings already exist — the question is what condition they're in.
A clean CAD export and a 20-year-old scan of a hand-drafted drawing are both "PDFs," but they behave completely differently when you're trying to pull instrument data out of them. Here's what actually matters.
Vector PDFs vs raster scans
The single biggest factor in extraction quality is whether your PDF contains vector graphics or raster images.
| Vector (CAD export) | Raster (scanned) | |
|---|---|---|
| Source | Exported directly from AutoCAD, MicroStation, SmartPlant | Flatbed scanner, photo, or print-to-PDF of a photocopy |
| Text | Selectable, searchable | Embedded in image pixels |
| Zoom behavior | Stays crisp at any zoom | Gets blurry when zoomed |
| Tag readability | Characters are exact | Subject to scan artifacts, smudging, fading |
| Typical file size | 200KB - 2MB per page | 5MB - 50MB per page |
| Extraction accuracy | High — text is unambiguous | Varies — depends on scan quality |
Quick test: open your PDF and try to select text with your cursor. If you can highlight individual tag numbers, it's vector. If you can only select the whole page as an image, it's a scan.
What degrades scanned drawings
Not all scans are equal. These are the specific issues that make instrument tags harder to read:
Resolution. Anything below 200 DPI starts losing fine detail in tag text. ISA bubbles are small — at 150 DPI, the difference between "FIT" and "FLT" or between "101" and "1O1" gets ambiguous. 300 DPI is the practical standard for readable scans.
Contrast. Faded blueprints, yellowed paper, or low-contrast photocopies reduce the distinction between text and background. The classic failure mode is a light pencil annotation on a blue background — nearly invisible in a scan.
Skew and rotation. Drawings fed through a sheet scanner at a slight angle produce rotated text. A 2-3 degree skew is common and usually manageable. Beyond 5 degrees, tag text starts getting misread.
Compression artifacts. Some document management systems re-compress PDFs for storage. Each compression cycle degrades image quality. If a drawing has been scanned, uploaded to a DMS, downloaded, emailed, and re-uploaded — it may have been compressed three or four times.
Annotations and markups. Red-line markups, cloud revisions, and sticky notes layered on top of instrument bubbles can obscure tag numbers. Hand-written corrections next to printed tags create ambiguity about which value is current.
What actually helps
If you have control over how drawings are prepared before extraction:
Re-export from CAD if possible. If the original DWG/DGN files still exist, a fresh PDF export will always produce better results than any scan. Even if the drawings are old, the vector data is still clean.
Scan at 300 DPI minimum. If you must scan, 300 DPI grayscale or black-and-white produces the best balance of quality and file size. Color scans are larger but don't improve text readability.
Use black-and-white mode for line drawings. Color scanning picks up paper yellowing, coffee stains, and background noise. B&W thresholding cleans all of that out and produces sharper text edges.
Flatten markups before scanning. If the drawing has red-line revisions, either accept/flatten them in the CAD source or scan without the markup overlay. Mixed layers confuse extraction.
Don't re-compress. Save scans as PDF with no additional JPEG compression. If your scanner software has a "quality" slider, set it to maximum.
Mixed drawing sets
Real projects rarely have uniform drawing quality. A typical brownfield set might include:
- 30 pages of clean CAD exports from a recent turnaround
- 15 pages scanned from the original 1990s construction package
- 5 pages that are photos of laminated control room copies
Each page type will produce different extraction quality. The key is knowing which pages need more attention during review — and that comes down to confidence scoring.
Pages with clear vector text produce high-confidence extractions. Pages with degraded scans produce lower confidence scores, which flag them for closer review. This is expected behavior, not a failure — it's the extraction telling you where to focus your time.
The 80/20 of brownfield extraction
On a typical mixed-quality drawing set:
- 60-70% of instruments extract cleanly with high confidence from the better-quality pages
- 20-25% extract with medium confidence — correct tag number, may need signal class verification
- 5-15% need manual review — degraded source, ambiguous characters, or unusual tag formats
The value isn't eliminating review entirely. It's reducing a multi-day manual effort to a focused review session on the flagged items.
Format recommendations by source
| Drawing source | Recommended preparation | Expected quality |
|---|---|---|
| Current CAD system (AutoCAD, MicroStation) | Export as vector PDF, no rasterization | Excellent |
| SmartPlant P&ID / AVEVA | Native PDF export | Excellent |
| Bluebeam project | Export or print to PDF | Excellent |
| Recent scan (< 5 years, 300 DPI) | Use as-is | Good |
| Old scan (> 10 years, unknown DPI) | Re-scan at 300 DPI if originals available | Fair to Good |
| Microfilm or aperture card scan | Re-scan at highest available DPI, B&W mode | Fair |
| Photo of printed drawing | Crop, straighten, increase contrast | Poor to Fair |
Bottom line
Vector PDFs from CAD produce the best results. Clean 300 DPI scans work well. Everything below that still works but needs more review time.
If you're starting a brownfield project and have a choice about how drawings get prepared — push for CAD re-exports. The time spent getting clean source files pays back directly in less review and fewer errors downstream.
Ready to automate your I/O list extraction?
Upload a P&ID and get a structured I/O list in minutes. 5 free pages included.
Try Tagsight Free