Confluence is a team content collaboration software. Onna supports Confluence Cloud and Server version 5.7 and up. Onna connects directly with the API to collect all information in native format. The integration collects all data and metadata from an entire Confluence site or individual spaces.
For a short video overview of collecting from Confluence, please see below:
All files are synced, including, but not limited to:
HTML content of the page
Comments on pages
Attachments for the page
Labels for attachments and pages
Ancestors for the page/attachments
Historical information and related metadata, including:
Author of the page
Last updated by/on
Previous Version created by/on
Types of Sync Available
We currently support three syncing modes - one-time, archive, and auto.
One-time is a one-way sync that collects information only once.
Archive means that Onna will perform a full sync first and will continuously add any new files generated at the data source. The sync type does not delete files deleted from the data source
Auto-sync means that Onna will perform a full sync first and will keep the data source and Onna in mirrored sync. Any deletions from the data source will be deleted in Onna, as well.*
The synchronization scope currently encompasses entire Confluence sites, specific Confluence spaces, and specific Confluence pages.
*This sync mode is only supported for Confluence pages and files attached to a space.
All files and metadata can be exported in eDiscovery ready format. Load files are available in a dat, CSV, or custom text file.
The following metadata fields are exported:
Space ID (text key field to identify space in Confluence)
Confluence Space Type
Ancestors for a file
List of Labels
All date related metadata
What does the export look like?
We've compiled sample load files for our different integrations. Click on the link below to download a sample Confluence export.
How to Guide
Click on "Add source" and select Confluence.
That will lead you to the following page:
First, name your source. This is the source's title on the platform. If you're naming for eDiscovery purposes a common convention is to name it after the company.
Enter the Confluence site's URL as the host. You will also need to enter your credentials including your full username's email as well as an API token. To generate an Atlassian API token, follow the instructions here.
If the site is not hosted on Atlassian, then select the option to log in with user and password.
Note: Confluence sources in Onna do not store usernames/passwords, instead they use JSESSION ID cookies. These credentials will need to be refreshed when the cookie expires. To avoid being frequently prompted to renew credentials, we suggest extending the amount of time the cookie is valid. You can follow the instructions here.
You have the option to Confirm, which will auto sync all of the spaces or you have the option to Configure the synchronization settings.
If you choose to configure your collection, you can filter on the spaces as indicated below. Select the space(s) you would like to sync. You can use the letters to search for specific spaces alphabetically or use the text input box to search for it by typing it's name. To sync all, click 'Select all' in the top right hand corner.
The final step is to select the sync mode that you'd like for the source. We describe the different sync types above.
Note: If you'd like to use the date range feature, select One-time Sync.
Once you have clicked 'Confirm,' you will see this integration under 'My Sources' page. Onna will begin to interact with Confluence's API and begin to sync files. Files will be processed and indexed so that all is searchable. A source will indicate that it's syncing during this process.
When you click on the Confluence data source, you will start seeing results being populated.
From this screen, you are able to filter results by date range, categories, and/or extensions using the menu on the left.
Confluence pages in Onna
Onna demonstrates the Confluence page as a PDF representation of the page.
Accessing audit logs
Clicking on the information icon on the top right will take you to the source details where you can see how many files it has and its size.
Click on Audits to see logs from collection and processing
You can learn more about accessing and navigating the source's collection audit logs in this article.
For Confluence on-premise collections, is it necessary to install anything on a server?
Yes, one needs to install an application on a Windows machine that is always on and has constant connectivity to the Confluence server and Internet.
Check out our guide to collecting from Confluence on-premise
What type of login is needed - database or user?
A user account to Confluence with full access to the space(s) that need to be collected.
Can Onna sync Confluence "sub-spaces"?
Confluence sub-spaces do not actually exists. What people may refer to as sub-spaces are the top-level pages within a Confluence space. They might think of them internally as a sub-space, but the API just sees them as pages. As such, Onna is unable to sync just a sub-space.
Here's is an article for reference: https://confluence.atlassian.com/confeval/confluence-evaluator-resources/confluence-can-i-create-subspaces
Can Onna collect Jira data from embedded Jira links within Confluence pages?
Even if the credentials are the same for both the Confluence and Jira accounts, Onna will not sync the data within the embedded Jira links. To sync Jira data a separate source needs to be created. More information on setting up a Jira source can be found here.
Why is Admin access required for Onna to pull data from Confluence?
We request admin access for completeness in the collection. A regular user may not have access to the space that is needed to collect or all of the pages in a space to collect. By authenticating with an admin user, we can be sure that all of the available spaces and pages are returned.
Is it possible to configure a sync to only collect a single Confluence page?
Yes, in order to sync a single Confluence page paste the link of the single page in the host name field. When configuring the source you will receive a message asking you if you want to sync a single page.
Can the version history of Confluence pages be collected?
Only the current version of the Confluence page will be synced into Onna.
If I choose to collect a single Confluence page are parent & children pages also retrieved?
When you choose to sync a single Confluence page, only the page for the link provided will be synced. Any parent or children pages will not be synced.
Is it possible to collect archived spaces?
At this time it is not possible to sync archived spaces due to an API limitation. We suggest changing archived spaces to current in order to perform the required collection. Once the collection has successfully completed the spaces can be archived again.