
Enabling large-scale documentation transformation with Doc to XML
A long-term public sector customer needed to scale the migration of thousands of documents into a structured XML-based system. By using one of Etteplan’s rAIse tools, Doc to XML, the customer reduced conversion effort by approximately 80%, transforming a slow, manual process into a scalable workflow and enabling progress toward a long-term digitalization roadmap.
The project in a nutshell
Challenge
Migrating thousands of documents from Word and PDF into structured XML was slow and difficult to scale due to manual conversion work. The fragmented document landscape also made content hard to search, maintain, and update efficiently.
Solutions and Services
Doc to XML enabled automated document conversion by transforming Word and PDF files into structured XML. This provided a consistent foundation for managing content and allowed large volumes of documentation to be processed more efficiently.
Added value
Using Doc to XML, the customer reduced document conversion effort by approximately 80%, turning a manual bottleneck into a scalable process. Tasks that previously took weeks were completed in days, enabling large-scale migration work to progress within planned timelines.
Thousands of documents, but no scalable way to manage them
The customer was responsible for maintaining extensive infrastructure documentation, much of it stored in Word and PDF formats across different systems. While the content itself was valuable, the format made it difficult to search, maintain, and update efficiently. The goal was to centralize documentation and improve usability, but the scale of the task made it difficult to execute using manual methods. Transforming documents into XML required copying content, rebuilding structure, and ensuring consistency element by element. Even small document sets could take weeks to process, making large-scale migration difficult to execute within a realistic timeframe.
Replacing manual reconstruction with structured automation
To support the migration work, Doc to XML was used to automate document conversion.
The tool uses AI-assisted processing to convert Word and PDF documents into structured XML, automatically identifying and modularizing content into topics such as tasks, concepts, and references. Instead of manually copying and rebuilding content element by element, the conversion creates a structured foundation that can be refined and validated.
Practical considerations for efficient conversion
The project also highlighted important practical considerations when working with automated conversion. The quality of source material plays a role in the outcome, as documents that are not properly structured may require some cleanup to achieve optimal results. For example, lists that appear visually correct in Word may not be properly defined as structured lists, requiring manual correction before conversion.
When conversion stops being the bottleneck
The impact was immediate and significant. Tasks that previously took an estimated 3–4 weeks were completed in approximately three days, reducing the effort required for document conversion to around one-fifth of the original workload. More importantly, what had been a slow, manual bottleneck became a scalable process capable of handling large volumes of documentation. This also improved responsiveness, as updates that previously required long lead times could now be delivered much faster.
Making a long-term migration plan achievable
The customer’s objective is to migrate all documentation into XML format by 2030. Without automation, this would not have been feasible within the given timeline. With Doc to XML, the migration effort has become manageable at scale, allowing content to be systematically converted and published into a centralized documentation portal. As more content is structured and centralized, users can search and filter information more effectively instead of navigating individual PDF files. Moving documentation into XML also enables further use of the content across different channels and formats.
From manual effort to scalable capability
By replacing manual document conversion with a structured and automated approach, Etteplan transformed a time-consuming bottleneck into a scalable capability. What was previously difficult to plan and execute at scale is now a repeatable and efficient process that supports ongoing documentation updates and large-scale migration work.


