Tika

Since Camel Quarkus1.0.0-CR3 JVMsupported Nativesupported

Parse documents and extract metadata and text using Apache Tika.

What’s inside

Please refer to the above link for usage and configuration details.

Maven coordinates

<dependency>
    <groupId>org.apache.camel.quarkus</groupId>
    <artifactId>camel-quarkus-tika</artifactId>
</dependency>

Check the User guide for more information about writing Camel Quarkus applications.

Camel Quarkus limitations

Parameters tikaConfig and tikaConfigUri are not available in quarkus camel tika extension. Configuration can be changed only via application.properties.

While you can use any of the available Tika parsers in JVM mode, only some of those are supported in native mode - see the Quarkus Tika guide.

Use of the Tika parser without any configuration will initialize all available parsers. Unfortunately as some of them don’t work in the native mode, the whole execution will fail.

In order to make the Tika parser work in the native mode, selection of parsers for initialization should be used.

  • quarkus.tika.parsers Comma separated list of parsers (abbreviations). There are two predefined parsers: pdf and odf.

  • quarkus.tika.parser.* Adds new parser abbreviation to be used with previous property. Value is the full class of the parser.

Example of application.properties:

quarkus.tika.parsers = pdf,odf,office
quarkus.tika.parser.office = org.apache.tika.parser.microsoft.OfficeParser

For more information about selecting parsers see the Quarkus Tika guide.