Seed Scoping

Seed scoping is the process of setting parameters that determine how much or how little of a site the Archive-It crawler reaches. This process is used to ensure that you are collecting all of the pages you want to collect, and avoiding those which you do not wish to become part of the collection. Seed scoping is handled through a variety of tools within Archive-It. Seed scoping is perhaps the most important element in web archiving. Proper scoping allows you to be very selective with the web content that you wish to add to your archive. It also assists you in reducing duplication and preserving your data budget.

Seed Type

Properly setting the seed type is the first step in seed scoping. There are a variety of seed types available

Scoping Rules

There are a variety of scoping rules available that allow you to either expand or contract the scope of your crawl. Below you will find a brief description of each. There is also thorough documentation about crawl scope on the Archive-It support pages.

Collection Level Scoping Rules

Collection level scoping rules can be set from the Crawl Scope tab on the landing page of any collection. Collection level scoping rules apply to each seed in that collection. Collection level scoping rules are best utilized for content that may be consistent across seeds in your collection that you are sure you want to capture. This level of scoping can be useful to ensure you set the proper scoping rules to capture all YouTube videos for each seed in the collection.

Seed Level Scoping Rules

More often than not, you will be setting seed level scoping rules rather than collection level scoping rules. These rules are set by opening up the settings pane for any seed in your collection, and clicking on Seed Scope. You use the same scope contraction and expansion rules as outlined above. However, rather than those rules being applied to every seed within your collection, they are applied to individual seeds.