Generating a Dataset Using Corpus-DB - Author Open Alex ID Search

IMPORTANT: Follow the steps below if you are going to generate a database based on Open Alex Author IDs. If you do not have any Open Alex Author IDs in mind to generate a dataset with, then follow the Generating a Dataset Using Corpus-DB - Important Terms Search.

Follow these steps to generate a dataset using Corpus-DB

  1. Go to https://corpus-db.sdsc.edu.
  2. Enter the username and password you use for https://suave-net.sdsc.edu.
  3. If you are looking to generate a dataset, click on “Search” and then click “Submit”.
    • If you receive an error after doing this, open a new tab and navigate back to https://corpus-db.sdsc.edu. Make sure you enter your correct username and password.
  4. Make sure your email is correct at the top of the page.
  5. Name the project under “Project Name” with a fitting title.
  6. Next, select your search type as “Author Open Alex ID”.
  7. (This step is optional): You have the option to exclude authors with a certain number of coauthors. You have the choice to select the threshold for too many coauthors. It is defaulted at 25.
  8. Enter the IDs you would like to include, separating them with a comma.

    • (This step is optional): You have the option to include or exclude external authors in general or external authors within the authors’ institutions. Choose these based on your preferences.
  9. (This step is optional): Enter the keywords to tag authors by if the word appears in a title or abstract.

    • (This step is optional): You have the option here to exclude pieces that contain none of the keywords.
  10. (This step is optional): Under “Starting Year” and “Ending Year”, include year boundaries for search results.
  11. (This step is optional): Under “Institutions”, enter institutions you would like to include in the search separated by commas. Please include all names for the institution as to not exclude relevant results (i.e., UC San Diego, UCSD, University of California San Diego).
  12. (This step is optional): Under “Collaborating Institutions”, enter collaborating institutions you would like to include in the search separated by commas. Like “Institutions”, please include all names for the institution as to not exclude relevant results (i.e., UC San Diego, UCSD, University of California San Diego).
  13. Click “Submit”.
  14. Depending on the number of results, an email will be sent to the address specified in the beginning in seconds, minutes, or, in the worst case, hours.
  15. Follow the link sent to your email (https://corpus-db.sdsc.edu/collect) and enter the code included in the email.
  16. Your dataset will automatically download to your downloads folder.

Checkpoint

  • A dataset of the network files
  • A Netvis .json file