Data Pipeline
Lightweight Data Integration for Java
Build Pipelines in Code
Add data processing functions to your applications, services, and batch jobs using Java or other JVM languages. Learn more
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | DataReader reader = new CSVReader( new File(sourceFile)) .setFieldSeparator( '|' ) .setFieldNamesInFirstRow( true ); reader = new FilteringReader(reader) .add( new FieldFilter( "email" ) .addRule( new PatternMatch( ".*\\.com" ))); reader = new TransformingReader(reader) .add( new SelectFields( "email" , "fname" , "lname" )); reader = new TransformingReader(reader) .add( new RenameField( "fname" , "first_name" )) .add( new RenameField( "lname" , "last_name" )); DataWriter writer = new FixedWidthWriter( new File(targetFile)) .addFields( 64 ) .addFields( 20 ) .addFields( 20 ) .setFieldNamesInFirstRow( true ); Job.run(reader, writer); |
Build Pipelines Declaratively
Configure pipelines and components in XML or JSON to change quickly without redeployments. Learn more
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | < data-mapping-pipeline multithreaded = "true" > < source-entity name = "incomingCalls" > < fields > < field name = "event_type" required = "true" type = "STRING" allowBlank = "false" maximumLength = "25" /> < field name = "id" required = "true" type = "LONG" allowBlank = "false" /> < field name = "agent_id" required = "true" type = "LONG" allowBlank = "false" /> < field name = "phone_number" required = "true" type = "STRING" allowBlank = "false" minimumLength = "9" /> < field name = "start_time" required = "true" type = "DATETIME" pattern = "yyyy-MM-dd HH:mm" allowBlank = "false" /> < field name = "end_time" required = "false" type = "DATETIME" pattern = "yyyy-MM-dd HH:mm" allowBlank = "false" /> < field name = "disposition" required = "false" type = "STRING" allowBlank = "false" /> </ fields > </ source-entity > < data-mapping > < field-mappings > < field-mapping fieldName = "Event" sourceExpression = "source.event_type" /> < field-mapping fieldName = "Call ID" sourceExpression = "source.id" /> < field-mapping fieldName = "Agent ID" sourceExpression = "source.agent_id" /> < field-mapping fieldName = "Caller Number" sourceExpression = "source.phone_number" /> < field-mapping fieldName = "Call Start Time" sourceExpression = "source.start_time" /> < field-mapping fieldName = "Call End Time" sourceExpression = "source.end_time" /> < field-mapping fieldName = "Disposition" sourceExpression = "source.disposition" /> </ field-mappings > </ data-mapping > </ data-mapping-pipeline > |
Why Use Data Pipeline
Build ETL in Java
Code your extract, transform, load pipelines using a high performance language that fits your team's skills, has a mature toolset, and is easy to understand and maintain.Run Pipelines Locally
Develop and test pipelines locally on your desktop using your existing development and debugging tools.Manage Change
Track changes in Git or other source control systems, code review ETL logic with your team, and plug pipeline development into your CI/CD process.Customize
Enhance the engine to fit your unique needs. Plug in your own logic or modify existing behavior to your specific requirements.Embedded or Standalone
Integrate pipelines into your web, mobile, desktop, and batch applications or run them as separate, standalone jobs.Stream Real-Time or Batch
Set your pipelines to run on a schedule, when data is available, when an event or manual trigger occurs, or you can run them continuously to gain insight in real-time.Trusted by Many
