Onna's web crawler was created to index web pages.
The web crawler does not currently support password-protected websites or Captcha protected websites. If you need to crawl a password-protected site, please contact us and we will put you in touch with one of our partners.
What is collected?
The web indexer follows links to one level, meaning that it syncs the page(s) from the given URL(s) and all links in that page(s). It crawls html pages for:
- Text links
What are your sync modes?
We currently support one syncing mode - one-time.
- One-time is a one-way sync that collects information only once.
The synchronization scope currently encompasses the url to be synced and all the links in the initial page. Only one depth level for http links is supported.
Can you export the data?
Yes, you can export data and metadata in eDiscovery ready format. Load files are available in a dat, CSV, or custom text file.
How to Guide
Click on "Add new source" and select Web Crawler.
Name your data source and enter the URL. Separate the URLs with a comma if entering more than one.
Once you have clicked "Done", you will see this integration under the My Sources page. Data is being indexed instantaneously and you can see the status as "Uploading". Once the data source is fully synced, you will see a green cloud with the last sync date.
When you click on the Web Crawler data source, you will start seeing results being populated.
From this screen, you are able to search for keywords or filter results by metadata fields such as file types and file creation date. For source details, click on the "information" icon.
For audit logs information, see our article on audit logs here.
Can I pull files in their native format from the links on a web page during collection?
Web crawler links are embedded so they will not be able to pull files in their native formats.