How to Vet GitHub Profiles Using JavaScript and the GitHub API

How to Vet GitHub Profiles Using JavaScript and the GitHub API

Learn how to validate and clean user-submitted data by vetting GitHub profiles using JavaScript and the GitHub API.

Introduction

Recently, I encountered a scenario where I needed to clean and validate data submitted via a form. The intent was to source developers, and therefore, the Form fields included names, emails, GitHub profiles, etc. Due to the nature of the form being made public, many responses were received, including from non-developers. The response was in thousands, so a db cleanup was then activated. To achieve the cleanup, I initiated a process of vetting the responses using GitHub profiles to ensure the information provided was accurate and meaningful. Using JavaScript and the GitHub API, I developed a solution that:

  1. Verified if a GitHub profile exists.

  2. Checked if the profile has more than three repositories.

  3. Checked if the profile has made more than five commits across all repositories.

  4. Exported the results to a CSV file for further processing.

This tutorial walks you through the process of building this solution.


Prerequisites

Before diving into the code, make sure you have the following:

  1. Node.js installed on your machine.

  2. A GitHub Personal Access Token (optional but recommended to avoid hitting API rate limits). You can create one by following GitHub's guide.

  3. Basic knowledge of JavaScript and Node.js.


Set Up the Project

Initialize the Project

Open your Terminal to start a new project

mkdir github-profile-vetting
cd github-profile-vetting
npm init -y
npm install axios

Create the Script File

Create a file named vetProfiles.js and paste the following code. Below is a breakdown of each function in the script:

Import the required Libraries

const axios = require('axios');
const fs = require('fs');
const path = require('path');

axios is used to make HTTP requests to GitHub's API endpoints, allowing the script to fetch data.

fs enables file creation and writing, which is crucial for exporting the processed data (in this case, to a CSV file) that can be shared or analyzed further.

path ensures the portability of file paths across operating systems without path-related errors.

Set Up the Axios Instance

// GitHub API base URL
const GITHUB_API_BASE_URL = 'https://api.github.com';

// Replace with your GitHub personal access token (optional but recommended for higher rate limits)
const GITHUB_PERSONAL_ACCESS_TOKEN = 'your_personal_access_token';

const axiosInstance = axios.create({
  baseURL: GITHUB_API_BASE_URL,
  headers: GITHUB_PERSONAL_ACCESS_TOKEN
    ? { Authorization: `Bearer ${GITHUB_PERSONAL_ACCESS_TOKEN}` }
    : {},
});

This initializes an Axios instance with the base URL for GitHub's API. If a personal access token is provided, it includes an Authorization header for authenticated requests, which helps avoid API rate limits.

Function: checkGitHubProfile

async function checkGitHubProfile(username) {

  try {
    const userResponse = await axiosInstance.get(`/users/${username}`);
    const publicReposCount = userResponse.data.public_repos;

    if (publicReposCount <= 3) {
      return { username, exists: true, meetsCriteria: false, isProspect: true, reason: 'Less than 3 repositories' };
    }

    let totalCommits = 0;
    let page = 1;

    while (true) {
      const reposResponse = await axiosInstance.get(`/users/${username}/repos`, {
        params: { per_page: 100, page },
      });

      const repos = reposResponse.data;
      if (repos.length === 0) break;

      for (const repo of repos) {
        const commitsResponse = await axiosInstance.get(`/repos/${repo.owner.login}/${repo.name}/commits`, {
          params: { author: username, per_page: 1 },
        });
        totalCommits += commitsResponse.headers['x-total-count']
          ? parseInt(commitsResponse.headers['x-total-count'], 10)
          : 0;
      }
      page++;
    }

    if (totalCommits <= 5) {
      return { username, exists: true, meetsCriteria: false, isProspect: true, reason: 'Less than 5 commits' };
    }
    return { username, exists: true, meetsCriteria: true, isProspect: false, reason: 'Meets all criteria' };
  } catch (error) {
    if (error.response && error.response.status === 404) {
      return { username, exists: false, meetsCriteria: false, isProspect: false, reason: 'Profile does not exist' };
    }
    return { username, exists: false, meetsCriteria: false, isProspect: false, reason: 'Error occurred' };
  }
}

This function checks if a GitHub profile exists and evaluates it based on repository count and commit history.

    1. Verify if the profile exists using the /users/:username endpoint.

      1. Check the number of public repositories.

      2. Fetch commit data from each repository and calculate the total commits.

      3. Return the result, including whether the profile meets the criteria.

Function: checkProfiles

async function checkProfiles(profiles) {

  const results = [];

  for (const username of profiles) {
    const result = await checkGitHubProfile(username);
    results.push(result);
  }

  const csvContent = [
    'Username,Exists,MeetsCriteria,IsProspect,Reason',
    ...results.map(r => `${r.username},${r.exists},${r.meetsCriteria},${r.isProspect},"${r.reason}"`),
  ].join('\n');

  const outputPath = path.join(__dirname, 'github_profiles_check.csv');
  fs.writeFileSync(outputPath, csvContent);

  console.log(`Results exported to ${outputPath}`);
}

This function processes multiple GitHub profiles and generates a CSV file with the results. It iterates through the list of usernames and calls checkGitHubProfile for each. The response is then formatted into a CSV string and exported to a file named github_profiles_check.csv

Example Usage

const githubProfiles = ['octocat', 'torvalds', 'nonexistentprofile'];
checkProfiles(githubProfiles);

Pass some GitHub profiles to the checkProfiles function with a list of sample usernames in an array string.


Run the Script

node vetProfiles.js

The script will verify each GitHub profile and check for repositories and commits. The results will be exported to a CSV file named github_profiles_check.csv.


Export and Analyze the Data

  • Open the CSV file in Google Sheets or Excel to review the results.

  • Use Conditional Formatting in Google Sheets to highlight cells where MeetsCriteria is TRUE.

  • Use the COUNTIF function to count how many profiles meet the criteria:

      =COUNTIF(ColumnA:ColumnZ, "TRUE")
    

Lessons Learned

This process helped clean up and validate user-submitted data efficiently. The GitHub API allowed me to automate a significant part of the work, reducing manual effort and errors. The experience reinforced the importance of leveraging APIs to streamline workflows.


Conclusion

By combining JavaScript, the GitHub API, and some simple scripting, I was able to build a powerful tool to vet GitHub profiles and export the results for analysis. This tutorial can be adapted to other use cases requiring data validation and processing.

Feel free to modify the script to suit your needs and happy coding!