Document reader - HxGN SDx - Update 63 - Administration & Configuration

Administration and Configuration of HxGN SDx

Language
English
Product
HxGN SDx
Search by Category
Administration & Configuration
SmartPlant Foundation / SDx Version
10

The document reader looks into the PreProcessedContentFiles folder, which is located in the same location as the native files prior to the content discovery task, for preprocessed content files that have been extracted from third party applications. The document reader first checks for a <Filename with extension>_ContentFile.xml file, and if that does not exist, then it looks for a <Filename with extension>_ContentFile.txt file for tag extraction. If the document reader does not find the preprocessed content files, then it uses SmartPlant Markup to process the files.

The document reader extracts all the cross-referenced and linked information contained in the text into a document. When the document is processed through the workflow with the document reader interface attached, SmartPlant Markup converts the document into two files: a viewable format (.csf) file and a text content file (.txt) that contains all of the text identified in the document.

The Extract Content workflow step generates the text content file by adding the ISPFNContentFile interface definition to the file document and attaches the viewable .csf file to the native file object. The workflow process then continues, associating the Extract Data workflow step with the content file.

The tag extraction is done using the tag discovery patterns present in the database. The tags similar to the tag discovery pattern are extracted from the content file. If a match is found, the application creates either a master tag or an alias tag, based on the definition of the tag discovery pattern.

If the identified tag name is determined to be a master tag, only one tag item, the master tag, is created. If the tag name found matches the format for an alias tag, then the application creates the alias tag. An additional process checks the alias tag file to determine what the actual master tag will be called. If the master tag already exists, then the new alias tag is related to the master. It the master tag does not already exist, then the master tag is created and the alias tag is related to it.

The alias tags and the master tags are all related to the document associated with the file in which they were found.

  • If <Filename with extension>_ContentFile.xml files or <Filename with extension>_ContentFile.txt files are available for corresponding Office files, then those will be placed in the PreProcessedContentFiles folder, which is located in the same location as the native files prior to the content discovery task. For example, if the native files exist in C:\SmartPlantFusionData, then the <Filename with extension>_ContentFile.xml files should be placed in the C:\SmartPlantFusionData\PreProcessedContentFiles folder.

  • If preprocessed content files are available, then the Extract Content workflow step does not use SmartPlant Markup for the content extraction. Instead, it attaches the xml file or txt file available in the PreProcessedContentFiles folder to the original file in the application. The Extract Data workflow step then extracts the tags from the <Filename with extension>_ContentFile.xml file or the <Filename with extension>_ContentFile.txt file available in the PreProcessedContentFiles folder.

    If the preprocessed content files have titleblock information, then the application extracts the document name from the titleblock and renames the document. The document gets renamed with the value within the _TitleBlock_DrawingNo tag in the XML file. For more information on Title block, see Preprocessed content XML file format.