Annotation
Warning
This is an experimental feature and is likely to change or break fast
After extracting patterns, names and mentions from an Entities text fields, ftm-analyze
can store annotations in the indexText
field for the extracted tokens following the specification from the markdown-like syntax of the Elasticsearch annotated text plugin.
Example bodyText
of an entity:
During analysis, the email address will be detected and extracted as a pattern. Then, the resulting indexText
of this Entity will contain the annotation for the emailMentioned
property.
To know that this indexText
is annotated, a __annotated__
prefix is added.
Parsing
Applications can parse the annotated text knowing these conventions:
- Schema annotation:
c_<schema>
(it will include parent schemata)- Example:
[Jane Doe](s_Person&s_LegalEntity)
- Example:
- Fingerprints annotation (via rigour.names):
f_<value>
- Example:
[Jane Doe](f_doe+jane)
- Example:
- Pattern annotation (available properties):
p_<prop>
- Example:
[Jane Doe](p_namesMentioned)
- Example:
If extracted as a mentioned Person
, Mrs. Jane Doe would actually look like this:
[Mrs. Jane Doe](f_doe+jane&f_mrs+jane+doe&s_Person&s_LegalEntity&p_namesMentioned&p_peopleMentioned)
Disable
Annotating into indexText
is the default behaviour.
To disable this feature, set env var FTM_ANALYZE_ANNOTATE=0
or use the command-line flag (see reference)