Connect custom website to Rovo

This connector allows you to do a search crawl and index of your website to show in Rovo Search results and use in Rovo Chat and Agents.

What is indexed?

The custom website connector indexes these objects:

Web pages (mime type: text/html)
Text files (mime type: text/plain)

For each object, it indexes these attributes:

Name
URL
Created date
Last updated date
Description
Page content

Before you begin

This connector can only crawl secure (https) websites that you own.
We encourage you to review what pages are available on your website. All Rovo users will have access to all content available to the crawler (including content using any configured authentication).

Authentication options

The connector currently supports crawling sites with:

No authentication (public sites)
Basic authentication (username/password, without a login page)

Basic authentication

Crawling a site with basic authentication means that Rovo will index all content with the username and password provided in the setup form.

This is suitable when your organisation has sites that aren’t public, but also don’t require individual permissions (for example, some intranets or internal knowledge bases).

You may still need to edit the robots.txt file on your authenticated site.

Broad content access

Connecting a site with basic authentication means that every Rovo user on your Atlassian site will have access to all content available to the provided username and password.

Rovo will not respect individual permissions for this site.

Connecting and crawling your website

To get to the setup screen for your custom website in Atlassian Admin:

Go to Atlassian Administration. Select your organization if you have more than one.
Select Apps > AI settings > Rovo.
Under Sites, next to the site you want to connect, select Add connector.
Select Custom website and press Next.

To setup your crawl:

Enter a website name for the site you’d like to crawl.
Add the full URL of the domain. Include the protocol (https://).
Choose how often Rovo should index your site.
Choose your authentication method and fill in any applicable fields.
Review and agree to the data usage information.
Select Connect.

Troubleshooting

If you’re having issues connecting to your site, you may need to edit your robots.txt file. If you still have access problems, your site may have a firewall blocking connections.

Potential fix: Add atlassian-bot to your robots.txt

If you’re having issue connecting you may need to be able to edit the robots.txt file on your website. If you’re not sure what a robots.txt file is, see How to write a robots.txt file, or talk to your website administrator.

Your existing robots.txt file on your website will need this line added:

User-agent: atlassian-bot

If the website you’d like to crawl is a subdomain (for example, https://support.vitafleet.com/ ) the robots.txt file must be available at that subdomain (https://support.vitafleet.com/robots.txt), not at the domain (editing https://www.vitafleet.com/robots.txt will not work).

Be aware that your robots.txt file, including this atlassian-bot edit, is always visible to the public (unless your site requires authentication).

Limiting what content is indexed

If you add atlassian-bot to your robots.txt, you can also add specific allow or disallow rules and the connector will follow these rules. For example:

User-agent: atlassian-bot
Disallow: /not-useful/

This rule would allow Rovo to index every page on your site except the content under /not-useful/.

What to do if your site has a firewall

Some sites block connections from unknown IPs. You may need to talk to your website admin to get Atlassian IPs added to your allowlist in your firewall. These IPs are listed at both:

Atlassian IPs support documentation
Dynamically from https://ip-ranges.atlassian.com/, specifically at the IPs where the product is rovo-crawler

Once you’ve added those IPs to your firewall allowlist, you can try configuring the connector again.

Next steps

After you’ve finished setting up the crawl:

The crawling and indexing of your site will start immediately.
Pages will start to show in Search incrementally for you and your team over the next few hours.
Depending on the number of pages on your website, it may take some time for all your website’s content to be indexed and appear in Search.

Was this helpful?

It wasn't accurateIt wasn't clearIt wasn't relevant

Still need help?

The Atlassian Community is here for you.

Ask the Community