Convert CVs Between Any Format
Accept candidate CVs in any format — PDF, Word, plain text — and produce consistent branded profiles. AI extraction normalises everything.
The format fragmentation problem
Recruiters don’t get to choose how candidates submit their CVs. One candidate sends a polished PDF. Another sends a Word document with broken formatting. A third pastes plain text into an email. A fourth uploads a LinkedIn profile export. Some CVs arrive as scanned images from job boards that strip formatting entirely.
This format fragmentation creates a hidden cost in every recruitment workflow. Before you can present a candidate to a client, someone on your team needs to wrestle the content out of whatever format it arrived in and rebuild it in your agency’s standard template. The time spent on this extraction-and-reformatting cycle adds up to hours per day across a busy desk.
Vitae solves this by decoupling content from format. The AI extraction layer reads any input format and produces structured data. That structured data then flows into your template for rendering. The input format is irrelevant — the output quality is always the same.
Supported input formats
Vitae accepts candidate CVs in all common recruitment formats:
- PDF documents — the most common submission format. Vitae extracts text from both text-layer PDFs and image-based (scanned) PDFs using OCR when needed.
- Word documents (.docx) — preserves structural elements like headings and lists during extraction, then remaps them to your template’s structure.
- Plain text — email body text, job board exports, or copy-pasted content. The AI identifies sections and structure from formatting cues in the text.
- Rich text (.rtf) — legacy format still common in some industries. Full text and structure extraction.
How extraction normalises diverse formats
The key insight is that CV formatting is a two-step problem: extraction (getting structured data out) and rendering (putting it into your template). Most tools try to do both at once, which is why they struggle with unusual input formats.
Vitae separates these steps cleanly:
- Input parsing — the document is read and text is extracted. For PDFs, this means text layer extraction or OCR. For Word documents, this means structural parsing. For plain text, this means direct intake.
- AI extraction — the extracted text is analysed by the AI to identify and categorise content: personal details, work experience, education, skills, certifications, languages, and other fields. This step is format-agnostic — the AI works with text content, not document structure.
- Data normalisation — extracted fields are normalised into a consistent schema. Date formats are standardised. Company names are cleaned. Skills are categorised. The result is a structured data record that is identical regardless of input format.
- Template rendering — the normalised data flows into your chosen template. LaTeX renders the final PDF with consistent typography, layout, and branding.
Why format matters less than you think
Recruiters often worry about “losing information” when converting between formats. In practice, the opposite is true: AI extraction captures more useful information from a messy CV than a human doing manual reformatting, because it processes every line systematically rather than skimming and copy-pasting.
The quality of the output depends on the quality of the content, not the quality of the input formatting. A well-written CV in plain text produces a better Vitae profile than a poorly written CV in a beautifully designed PDF template.
This is liberating for recruiters: stop spending time asking candidates to reformat their CVs or to submit in a specific format. Accept whatever they send and let Vitae handle the normalisation.
Batch conversion for pipeline efficiency
When you’re building a shortlist, you might need to format five or ten CVs at once — each in a different input format. Doing this manually means context-switching between different source documents, copy-pasting into templates, and fixing formatting issues one by one.
With Vitae, the workflow is uniform regardless of input format:
- Upload all candidate CVs — mix of PDF, Word, and text
- Review the extracted data for each candidate
- Apply your template to all candidates at once
- Download a shortlist of consistently formatted profiles
The result is a shortlist where every candidate profile has identical formatting, structure, and branding — regardless of what format the original CV was in. This is the level of consistency that impresses clients and positions your agency as a professional operation.
Output quality: LaTeX vs. web rendering
After extraction and normalisation, the final step is rendering. Vitae uses LaTeX — the same typesetting engine used by academic publishers — to produce the output PDF. This means:
- Precise typography — proper kerning, ligatures, and hyphenation. Text flows naturally across the page.
- Consistent page breaks — LaTeX’s algorithm optimises page breaks to avoid orphan lines and awkward section splits.
- Print-ready output — the PDF is production-quality. Whether viewed on screen or printed, the document looks professional.
- Deterministic rendering — the same data and template always produce the exact same output. No rendering inconsistencies between browsers or operating systems.
Most competing tools use web-based rendering (HTML to PDF), which produces adequate results but lacks the typographic precision that LaTeX provides. For agencies that compete on presentation quality, this difference is visible and meaningful.