Connect custom website to Rovo
This connector allows you to do a search crawl and index of your website to show in Rovo Search results and use in Rovo Chat and Agents.
What is indexed?
The custom website connector indexes these objects:
Web pages (mime type:
text/html
)Text files (mime type:
text/plain
)
For each object, it indexes these attributes:
Name
URL
Created date
Last updated date
Description
Page content
Before you begin
This connector can only crawl secure (https) websites that you own.
To ensure you own the domain or subdomain, you need to be able to edit the robots.txt file in the website you’d like to crawl.
We encourage you to review what pages are available on your website. All Rovo users will have access to all content available to the crawler (including content using any configured authentication).
Editing your robots.txt
You need to be able to edit the robots.txt file on your website. If you’re not sure what a robots.txt file is, see How to write a robots.txt file, or talk to your website administrator.
At minimum you’ll need to add the following to your existing robots.txt file on your website:
User-agent: atlassian-bot
Note that User-agent: *
will not permit crawling alone, the atlassian-bot
line must be included.
If the website you’d like to crawl is a subdomain (for example, https://support.vitafleet.com/
) the robots.txt file must be available at that subdomain (https://support.vitafleet.com/robots.txt
), not at the domain (editing https://www.vitafleet.com/robots.txt
will not work).
Be aware that your robots.txt file, including this atlassian-bot
edit, is always visible to the public (unless your site requires authentication).
You can additionally add specific allow or disallow rules to the robots.txt and the connector will follow these rules, for example:
User-agent: atlassian-bot
Disallow: /not-useful/
This rule would allow Rovo to index every public page on your site except the content under /not-useful/
.
Authentication options
The connector currently supports crawling sites with:
No authentication (public sites)
Basic authentication (username/password, without a login page)
Basic authentication
Crawling a site with basic authentication means that Rovo will index all content with the username and password provided in the setup form.
This is suitable when your organisation has sites that aren’t public, but also don’t require individual permissions (for example, some intranets or internal knowledge bases).
You will still need to edit the robots.txt file on your authenticated site.
Broad content access
Connecting a site with basic authentication means that every Rovo user on your site will have access to all content available to the provided username and password.
Rovo will not respect individual permissions for this site.
Connecting and crawling your website
To get to the setup screen for your custom website in Atlassian Admin:
Go to Atlassian Administration. Select your organization if you have more than one.
Select Settings > Rovo.
Under Sites, next to the site you want to connect, select Add connector.
Select Custom website and press Next.
To setup your crawl:
Enter a website name for the site you’d like to crawl.
Add the full URL of the domain. Include the protocol (
https://
).Choose how often Rovo should index your site.
Choose your authentication method and fill in any applicable fields.
Review and agree to the data usage information.
Select Connect.
Next steps
After you’ve finished setting up the crawl:
The crawling and indexing of your site will start immediately.
Pages will start to show in Search incrementally for you and your team over the next few hours.
Depending on the number of pages on your website, it may take some time for all your website’s content to be indexed and appear in Search.
Was this helpful?