Planning: Begin with a clear plan for your data flow, including data sources, transformations, and destinations. Review the description and properties of a processor to understand its purpose and usage.
Organize: Organize your canvas and process flows into logical groups by using Process Groups feature. Grouping data flows helps with manageability, streamlines productivity, and enhances troubleshooting capabilities.
Naming Convention: Assign meaningful names to Processors using clear and consistent naming conventions that indicate a purpose or usage. This will provide better understanding of their significance and purpose within a given data flow, when viewing the canvas. For example, instead of naming a connection “EAM - WO”, try something more descriptive such as “EAM Add WO” or similar to indicate purpose)
Error Handling: Implement error handling strategies, including failure and retry mechanisms, to ensure data integrity. Defining a failure relationship, when available, ensures reliable processes as well as enables proper monitoring in case of failures or other issues in the data flow.
It is the responsibility of the data flow designer to define all relationships and
how they are handled.
Performance Tuning: Adjust the Processor scheduling to run according to your interval requirements.
Optimize Scalability: Create data flows with scalability in mind, considering future growth and resource requirements. Utilize node and scheduling configurations based on the type of data being ingested. See the Databridge Pro Technical Reference.
Testing: Utilize the “Run Once” operation when creating or modifying data flows to confirm it is functioning as expected. Checking processor connections in a step-by-step manner can help make sure the flows and transformations will function as intended by the designer.
Community Resources: Databridge Pro is powered by Apache NiFi.
Built-in Language: Utilize and learn NiFi’s Expression Language to dynamically configure and modify attributes and properties based on conditions and calculations.
Complete Data Flows: Always ensure data flows are well-designed with components that take meaningful action with data, including the final stop of the flow. Data should not remain unresolved by accumulating indefinitely in queues. Every piece of data should have a clear path to resolution, whether it’s being processed, stored, transmitted, or deliberately discarded.
Flow files should terminate with processors. Funnels should NEVER be used as the final step in a flow file since queue entries will never be removed.
Optimize Processor Usage: Minimize the number of processors used when designing flows for transformation tasks, as each additional processor introduces overhead through queuing and serialization operations. To reduce processing time and improve performance, consider consolidating transformation logic into fewer, more powerful processors rather than chaining multiple simpler processors.