Posted on

Mass Upload Websites

Summary: Bulk upload of Websites along with each website’s group assignments and crawl and scan settings can be mass imported into the platform.

It is recommended that you use our CSV file mass website import template and not modify the header row or add or remove columns.

When to use

When you wish to import new websites in bulk into the platform.

Permissions/Role Requirements

Only users with the “Manage Websites” permissions, who have been also given “Bulk operations” permissions and who have access to the “Root Group” in the account have the ability to mass upload websites. Users without these permissions or assigned to a subgroup cannot use this feature.

How to Mass Import Websites

  • Step 1: Activate the Websites Button
  • Step 2: Activate the “Mass Import” Button located at the top of the Websites Page
  • Step 3: Active the Choose File Button
  • Step 4: Add the CSV file with your Bulk Upload Websites
  • Step 5 (optional): Select the “Add hompage to websites” checkbox if you want the homepage of the website automatically added.
  • Step 6: Activate the Save Button

The mass import of the websites should be initiated and within a few minutes, the websites will be added to the account with their associated groups and crawling and scanning settings.

Website bulk crawl and other actions

From the website view, you can now crawl or scan multiple websites. For example, to crawl multiple websites you would select the checkbox next to the websites you wanted to crawl and then activated the crawl button.

The header row of the websites listings has a select all checkbox, by selecting this you can select all of the websites on the current page, once selected there is an option to select all of the websites in the entire filter.

CSV File Format for Bulk Website Upload

The mass upload will only accept CSV file. It must only contain all 16 field columns. It is recommended that the mass website import template is used and that the top row is left unmodified.

Each subsequent row should represent a new website that the user wishes to import.

Below is the text of a simple version of what the CSV file could contain:

"Base URL","Group","Parent Group (If group doesn’t exist)","Name","Website Notes","Crawl: Max Pages","Crawl: Max Depth","Crawl: Include Subdomains? (Y to enable)","Crawl: Rate Limit (low, medium, high)","Crawl: Manual Rate Limit (Pages per minute)","Crawl: Whitelist Filters","Crawl: Blacklist Filters","Scan: Viewport Width","Scan: Evaluation Delay","Scan: Rate Limit (low, medium, high)","Scan: Manual Rate Limit (Pages per minute)"
blog.pope.tech,,,,,,,,,,,,,,,
pope.tech,,,,,,,,,,,,,,,

Columns/Options

Required Fields

The base URL is the only required field in this file. The other fields must still be represented by their respective commas, but they may be empty. In total, all 16 fields must be represented for each website row.

All of the fields map on to identical form fields in the Edit Website view for an individual website in the platform.

Base URL*

The base URL is the only required field. All crawls & scans will be based on this entered URL.

Correct base URL: pope.tech

IMPORTANT: Currently, the http:// or https:// will be added to the base URL. DO NOT add http:// or https:// to the URL or the upload will fail to process.

Group

The group field lists which group the website will immediately be nested in. If the group already exists in the account, the website will be added to that group. If the group does not already exist that group will be created and the website will be added to it.

If the group field is left empty, the website will be added to the root group of the account.

Parent Group (If group doesn’t exist)

This identifies which parent group a group is added to in the account. This option should only be used if a group option for a website doesn’t already exist in the account. If the group already does exist in the account, the parent group will be ignored.

Example of Group and Parent Group

If a user wanted to add a website to a group called “Department” and wanted the “Department” group to be nested under “School,” they would list “Department” in the Group field, and “School” in the “Parent Group” field. When the mass import happened, the website would be uploaded into the group “Department” if that group already existed. If the group “Department” didn’t exist already in the account, it would be created. Similarly, the group Department would be nested as a subgroup under the Parent Group “School” if it already existed. If Group “School” didn’t exist it would be created.

Name

If you desire the website to have a name other than its URL, enter your chosen name in this field.

Website Notes

Website notes is a text field that allows users to add custom comments about the website. This can be any type of note the users have about the website.

Crawl: Max Pages

Crawl: Max Page is the maximum number of pages that the crawler will find and add to a website when a crawl is deployed. Numbers are the only valid options for this field.

Crawl: Max Depth

Crawl: Max depth is used to specify how many subdirectories or folders deep the crawler will go on a website. We have rarely seen sites that need to go beyond 10. Numbers are the only valid option for this field.

Crawl: Include Subdomains? (Y to enable)

There are two options for this field “Y” (quotes omitted) or left blank. The letter “Y” enables this option. This option doesn’t take effect until this website is crawled. When the mass import happens this option remains inactive and only the Root URL is added. However, once that URL is crawled, if this setting is activated, if the crawler comes across a subdomain of the site, it will add it to account with the same website settings as its Root URL.

Crawl: Rate Limit (low, medium, high)

Crawl: Rate Limit is used to throttle the speed of the crawler. There are only 4 valid options that can be entered for this field: empty, “low”, “medium”, or “high”. If this field is left empty, it will check to see if a manual crawl rate was entered (next field). If there is no crawl-rate specified, the organization’s default crawl rate will be applied.

The following are the crawler limits per minute based on each speed:

  • Low = limit 10 pages a minute crawled
  • Medium = limit 60 pages a minute crawled
  • High = limit 120 pages a minute crawled

Crawl: Manual Rate Limit (Pages per minute)

If the user wishes to specify their own crawl rate, they can do so using the Crawl: Manual Rate Limit. Numeric values are the only valid options in this field. The number entered will indicate the maximum number of pages per minute that the crawl will find for that website. For example, if a 30 was entered, that would mean the max crawl rate limit for that website would be 30 pages a minute.

If a user wishes to enter a manual crawl rate limit, they should leave the previous field “Crawl – Rate Limit (low, medium, high)” empty.

Crawl: Whitelist Filters

Whitelist set in crawling settings will restrict the crawler to only bringing in URIs that contain the designated path or character string. A whitelist of “/blog” (drop the quotes for entry in the field) will only include pages who have /blog in its URL path. If multiple whitelists are wanted, leave a space between each one, but enclose them all in the same comma set. For example “,/blog /calendar /directory,” will include pages from all three of these paths. White lists and blacklist filters can both be used at the same time. If a Whitelist or Blacklist filter starts with a ? it should be escaped first with a backslash (“\?” don’t include the parentheses or quotation marks).

Crawl: Blacklist Filters

Crawl: Blacklist Filters tells the crawler to exclude any URI that contains the designated path or character string. A blacklist of “/blog” (drop the quotes for entry in the field) will only exclude pages who have /blog in its URL path. If multiple blacklists are wanted, leave a space between each one, but enclose them all in the same comma set. For example “,/blog /calendar /directory,” will exclude pages from all three of these paths. Whitelists and blacklist filters can both be used at the same time. If a Whitelist or Blacklist filter starts with a ? it should be escaped first with a backslash (“\?” don’t include the parentheses or quotation marks).

Scan: Viewport Width

The scanner viewport Width specifies the viewport width of the headless, chromium browser in the server that loads the web pages on the website for the scans. The default width for scans is 1200. Users can modify this width to replicate how websites load on different width devices. Depending on who the website is designed different accessibilty issues may arise at different viewport widths.

Only numeric values are valid options in the Scan – Viewport Width field. If this field is left blank, scans will be completed at the organization’s default viewport width.

Scan: Evaluation Delay

The Scan – Evaluation Delay is the delay that the scanner waits after the page sends the page loaded signal before deploying the scan. The evaluation delay is measured in milliseconds. The platform default for the evaluation delay is 1000 (which is 1 second).

Only numeric values can be entered in this field and they can range from 0 to 5000 (which is 5 seconds). If this field is left empty the default value set by the organization for their Scan – Evaluation Delay.

Scan: Rate Limit (low, medium, high)

Scan: Rate Limit is used to throttle the speed of the accessibility scanner. There are only 4 valid options that can be entered for this field: empty, “low”, “medium”, “high”. If this field is left empty, it will check to see if a manual scan rate was entered (next field). If there is no Scan Rate Limit or Manual Scan Rate Limit specified, the organization’s default scan rate will be applied.

The following are the scan rate limits per minute based on each speed:

  • Low = limit of 10 pages a minute scanned
  • Medium = limit of 60 pages a minute scanned
  • High = limit of 120 pages a minute scanned

Scan: Manual Rate Limit (Pages per minute)

If the user wishes to specify their own scan rate, they can do so using the Scan: Manual Rate Limit (Pages per minute) field. Numeric values are the only valid options in this field. The number entered will indicate the maximum number of pages per minute that the scanner will scan that website. For example, if a 30 was entered, that would mean only 30 pages on that site could be scanned in a minute.

If a user wishes to enter a manual scan rate limit, they should leave the previous field “Scan: Rate Limit (low, medium, high)” empty.

Q&A

Do all the websites have to be in the same group?

No, you can specify which group or subgroup a website goes into. If the group or subgroup is not in the platform, it will be created by the bulk upload.

Are all the columns required?

The CSV file upload must contain content or be there but left empty for all 16 settings columns of each website being added. With that being said, only the “Website” is required, the rest can remain empty.

What happens if I don’t provide a group for a website?

It is added to the root group for the account?

What happens if I don’t add crawler or scanning settings?

The default organization account settings are applied.