Table of contents

  1. Creating a GitHub Repository, Cloning and Folder Structure
  2. Creating a Virtual Environment
  3. Creating a calculator.py program
  4. Adding and Pushing Your Project Code to GitHub
  5. Creating tests using Pytest and Unittests
  6. Implementing GitHub Actions

GitHub link

Step 1: Creating a GitHub Repository, Cloning and Folder Structure

1.1 Setting up local working directory

  • It is recommended to setup a common local working directory or folder for all the labs.
  1. Create a folder called mlops_labs and open it with Visual Studio Code(VSC).
  2. Create a terminal inside VSC by clicking the Terminal menu -> New Terminal.

Now, Let’s create a GitHub repository for this lab and establish a structured folder layout. This organization helps maintain your project’s code, data, and tests in an organized manner.

We will first create a GitHub repository in our GitHub account and clone it into our local working directory.

1.2 Creating a GitHub Repository

  1. Open a web browser and login into your GitHub account.
  2. In the upper right corner, click the “+” button and select “New repository.”
  3. Choose a name for your repository.(e.g. “github_lab1”)
  4. Choose the visibility of your repository—either public (visible to everyone) or private (accessible only to selected collaborators)
  5. Check the “Initialize this repository with a README” box. This will create an initial README file that you can edit to provide project documentation.
  6. Click the “Create repository” button.

1.3 Cloning the Repository

  1. In your GitHub repository, click the green colored Code button then copy the url under the HTTPS tab.
  2. Run the following command in the VSC terminal to clone your GitHub repository into the current directory:
     git clone <repository_url>
    
  3. Replace with the URL of your GitHub repository you copied. After running the command, the repository will be cloned, and you'll have a local copy of your GitHub project in your chosen directory.

Project structure:

    mlops_labs/
    └── github_lab1/
        └── README.md

Establishing Folder Structure

Once you have cloned your repository, you can establish a structured folder layout within it. This layout helps organize your project into key directories for code, data, and tests.

  1. Move into the repository folder as you will be creating the rest of files inside this repository.
     cd github_lab1
    
  2. Create the following subfolders within your local repository:
    • data: This folder is used for storing project data files or datasets.
    • src: This folder is where you’ll store your project’s source code files.
    • test: This folder is dedicated to unit tests and test scripts for your code.
  3. Create a file named .gitignore.
    • This is useful to exclude any unnecessary files from version control such as virtual environments, pycache, etc.
  4. Create a requirements.txt file.
    • It is important to add all project dependenices in this file(i.e., all the required python packages or libraries)
    • Add pytest inside it as it is required by the later part of this lab. The file should look similar to this.

Project structure:

    mlops_labs/
    └── github_lab1/
        ├── README.md
        ├── data/
        ├── src/
        ├── test/
        ├── .gitignore
        └── requirements.txt

Step 2: Creating a Virtual Environment

In software development, it’s crucial to manage project dependencies and isolate your project’s environment from the global Python environment. This isolation ensures that your project remains consistent, stable, and free from conflicts with other Python packages or projects. To achieve this, we create a virtual environment dedicated to our project.

  1. To create a virtual environment, choose a name for your virtual environment (e.g “github_lab1_env”) and run the appropriate command:
     python -m venv github_lab1_env
    
  2. Activate the virtual environment by running the following command specific to your operating system (i.e., Windows/Linux/MacOS)
     # windows users
     github_lab1_env\Scripts\activate 
    
     # linux and mac users
     source github_lab1_env/bin/activate
    
  3. Add the virtual environment folder name inside your .gitignore file so that its not tracked by Git.

After activation, you will see the virtual environment’s name in your command prompt or terminal, indicating that you are working within the virtual environment.

After step-2, the project strucutre should look similar to this:

    mlops_labs/
    └── github_lab1/
        ├── README.md
        ├── data/
        ├── src/
        ├── test/
        ├── .gitignore
        ├── requirements.txt
        └── github_lab1_env/

Step 3: Creating calculator.py in src Folder

In this step, we create a Python script named calculator.py within the src folder of your project. This script contains a set of mathematical functions designed to perform basic arithmetic operations.

  • fun1(x, y) adds two input numbers, x and y.
  • fun2(x, y) subtracts y from x.
  • fun3(x, y) multiplies x and y.
  • fun4(x, y) combines the results of the above functions and returns their sum.
  1. Create empty __init__.py files in data, src and test folders. This will help in importing python modules from one folder into python files of other folder.
  2. Copy the calculator.py file located under the src folder in this link into your local src folder.

Note: Whenever you want to push files to your GitHub repository, follow - Adding and Pushing Your Project Code to GitHub

Project structure:

    mlops_labs/
    └── github_lab1/
        ├── README.md
        ├── data/
        │   └── __init__.py
        ├── src/
        │   ├── __init__.py
        │   └── calculator.py
        ├── test/
        │   └── __init__.py
        ├── .gitignore
        ├── requirements.txt
        └── github_lab1_env/

Step 4: Adding and Pushing Your Project Code to GitHub

Now that we have our virtual environment set up, the GitHub repository created, and the folder structure organized, let’s add our project’s code and push it to GitHub.

Adding Your Project Code

  • Once your project files are ready, it’s time to add them to Git’s staging area. In your project directory, run the following command:
      git add .
    

    This command stages all the changes and new files in your project directory for the next commit.

Committing Your Changes

  • After staging your changes, commit them with a meaningful commit message that describes the changes you made. Replace with a descriptive message:
      git commit -m "<your_commit_message>"
    

Pushing to GitHub

  • To push your committed changes to your GitHub repository, use the following command:
      git push origin main
    

Step 5: Creating tests using Pytest and Unittests

In this step, we’ll set up unit tests for the functions in our calculator.py script using two popular testing frameworks: pytest and unittest. Unit Testing ensures that individual components of your code work as expected, helping you catch and fix bugs early in the development process.

Using Pytest

  • Make sure your virtual environment is activated.
  • Since we have already added pytest in the requirements.txt file, we can install it by running:
      pip install -r requirements.txt
    
  • In general, if we want to install only pytest, we can run:
      pip install pytest
    

Note:

  1. It is a good practice to add all the dependecies to requirements.txt. This way anyone using the project can install all the project dependencies at once with one command.
  2. It is quite common to not know all project dependenices at the start of the project but as we install each of them, it is a good practice to keep adding them to requirements.txt.
  3. If you forgot to add the dependencies but would like to add them later, you can run the following command and copy-paste the output into the requirements.txt.
     pip freeze
    

Writing Pytest Tests

  • Pytest makes it easy to write tests for your Python code. Tests are written as regular Python functions, and test file names typically start with test_ or end with _test.py.
  • To run your Pytest tests, you can use the pytest command followed by the name of the test file or directory containing your tests:
      pytest test_<name>.py
    
  • Pytest automatically discovers test functions based on the naming conventions. It searches for functions starting with test_ or ending with _test, and it can discover tests in subdirectories as well. This makes it easy to organize your tests.
  • Pytest supports parameterized tests, which allow you to run the same test function with multiple sets of inputs and expected outputs. This is particularly useful for testing functions with different input scenarios. Please refer the commented out code in the test_pytest.py file for your reference.
  • Copy this test_pytest.py file into your test folder.
  • This file contains a series of test functions, each aimed at verifying the behavior of specific functions within calculator.py.
  • We’ve prepared four test functions (test_fun1, test_fun2, test_fun3, and test_fun4) to test the functions within calculator.py. Each test function uses the assert statement to validate the expected outcomes.
  • By running these pytest tests, you can verify that your calculator functions are working correctly.

To run the tests,

  1. Move inside the test folder using terminal
     cd test
    
  2. Run the test file
     pytest test_pytest.py
    
  3. The output should be as follows
     collected 4 items
    
     test_pytest.py ....                                                 [100%]
    
     ============================== 4 passed in 0.01s =========================
    
  4. Running pytests will create __pycache__ files which are not need to be tracked. So, add it to .gitignore file

Writing Tests with UnitTest

  • Unittest allows you to write tests as classes that inherit from the unittest.TestCase class.
  • Test methods are identified by their names, which must start with “test_” to be recognized as test cases.
  • To run Unittest tests, you typically execute your test script, which should include a call to unittest.main() at the end. Here’s how you can run the tests:
      python test_<name>.py
    
  • Unittest relies on test discovery, which means it will find test methods based on naming conventions, similar to Pytest. Test methods must start with “test_” to be recognized as test cases.
  • Unittest provides a variety of assertion methods, such as assertEqual, assertTrue, assertFalse, and others, to check conditions in your tests. You can choose the assertion method that best suits your testing needs.
  • Copy this test_unittest.py file into your local test folder.
  • This file contains a series of test functions, each aimed at verifying the behavior of specific functions within calculator.py.
  • We’ve prepared four test functions (test_fun1, test_fun2, test_fun3, and test_fun4) to test the functions within calculator.py. Each test function uses the self.assertEqual statement to validate the expected outcomes.
  • By running these unittest tests, you can verify that your calculator functions are working correctly.

To run the tests,

  1. Make sure you are inside the test folder. If not, move into it using
     cd test
    
  2. Run the test file
     python test_unittest.py
    
  3. The output should be as follows
     ....
     ----------------------------------------------------------------------
     Ran 4 tests in 0.000s
    
     OK
    

Project Structure:

    mlops_labs/
    └── github_lab1/
        ├── README.md
        ├── data/
        │   └── __init__.py
        ├── src/
        │   ├── __init__.py
        │   └── calculator.py
        ├── test/
        │   ├── __init__.py
        │   ├── test_pytest.py
        │   └── test_unittest.py
        ├── .gitignore
        ├── requirements.txt
        └── github_lab1_env/

Step 6. Implementing GitHub Actions

GitHub Actions is a powerful automation and CI/CD (Continuous Integration/Continuous Deployment) platform provided by GitHub. It enables you to automate various workflows and tasks directly within your GitHub repository. GitHub Actions can be used for a wide range of purposes, such as running tests, deploying applications, and automating release processes.

GitHub Actions work based on events, actions, and triggers:

  • Events: These are specific activities that occur within your GitHub repository, such as code pushes, pull requests, or issue comments. GitHub Actions can respond to these events.
  • Actions: Actions are individual tasks or steps that you define in a workflow file. These tasks can be anything from building your code to running tests or deploying your application.
  • Triggers: Triggers are conditions that cause a workflow to run. They can be based on events (e.g., a new pull request) or scheduled to run at specific times.

The Purpose of GitHub Actions:

  • Automation: It automates repetitive tasks, reducing manual effort and ensuring consistency in your development process.
  • Continuous Integration (CI): It allows you to set up CI pipelines to automatically build, test, and validate your code changes whenever new code is pushed to the repository.
  • Continuous Deployment (CD): It enables automatic deployment of your application when changes are merged into a specific branch, ensuring a smooth and reliable release process.

Using Pytest and Unittest with GitHub Actions:

  • Integrating Pytest and Unittest with GitHub Actions can significantly improve the quality and reliability of your codebase. Here’s how:
  • Pytest with GitHub Actions: You can create a GitHub Actions workflow (e.g., pytest_action.yml) that specifies the steps for running your Pytest tests. When events like code pushes or pull requests occur, GitHub Actions will automatically trigger the workflow, running your Pytest tests and reporting the results back to you. This helps you catch bugs and ensure that your code meets quality standards early in the development process.
  • Unittest with GitHub Actions: Similarly, you can create a separate GitHub Actions workflow (e.g., unittest_action.yml) to run your Unittest tests. This ensures that both your Pytest and Unittest suites are executed automatically whenever code changes are made or pull requests are submitted. It provides a robust validation mechanism for your codebase.
  • When collaborating in teams, the automated testing process ensures that all test cases pass successfully before allowing the merge of a pull request into the main branch.

Creating GitHub Actions workflow Files:

To implement Pytest and Unittest with GitHub Actions, you’ll create two workflow files under the .github/workflows directory in your repository:

  1. pytest_action.yml
  2. unittest_action.yml
  • These workflow files define the specific actions and triggers for running your tests.
  • Create a folder called .github and another folder call workflows inside it.
  • Copy-paste the pytest_action.yml and unittest_action.yml files into the .github/workflows folder.

Project Structure:

    mlops_labs/
    └── github_lab1/
        ├── README.md
        ├── data/
        │   └── __init__.py
        ├── src/
        │   ├── __init__.py
        │   └── calculator.py
        ├── test/
        │   ├── __init__.py
        │   ├── test_pytest.py
        │   └── test_unittest.py
        ├── .gitignore
        ├── requirements.txt
        ├── github_lab1_env/
        └── .github/
            └── workflows/
                ├── pytest_action.yml
                └── unittest_action.yml

1. pytest_action.yml

  • name(Workflow Name): The workflow is named “Testing with Pytest.”
  • on(Event Trigger): It specifies the event that triggers the workflow. In this case, it triggers when code is pushed to the main branch.
  • jobs: The workflow contains a single job named build which runs on the ubuntu-latest virtual machine environment.
  • Steps:
    1. Checkout code: This step checks out the code from the repository using actions/checkout@v2.
    2. Set up Python: It sets up the Python environment using actions/setup-python@v2 and specifies Python version 3.8.
    3. Install Dependencies: This step installs the project dependencies by running pip install -r requirements.txt.
    4. Run Tests and Generate XML Report:
      • The core testing step runs Pytest with the --junitxml flag to generate an XML report named pytest-report.xml.
      • The continue-on-error: false setting ensures that the workflow will be marked as failed if any test fails.
    5. Upload Test Results:
      • In this step, the generated XML report is uploaded as an artifact using actions/upload-artifact@v2.
      • The name of the artifact is test-results, and the path to the report is specified as pytest-report.xml.
    6. Notify on Success and Failure: These two steps use conditional logic to notify based on the outcome of the tests.
    7. if: success() checks if the tests passed successfully and runs the “Tests passed successfully” message.
    8. if: failure() checks if the tests failed and runs the “Tests failed” message.

2. unittest_action.yml

  • name(Workflow Name): This GitHub Actions workflow is named “Python Unittests.”
  • on(Event Trigger): The workflow is triggered by the push event, specifically when changes are pushed to the main branch.
  • jobs: The workflow defines a single job named build that runs on the ubuntu-latest virtual machine environment.
  • Steps:
    1. Checkout code: This step uses the actions/checkout@v2 action to check out the code from the repository. It ensures that the workflow has access to the latest code.
    2. Set up Python: The “Set up Python” step uses the actions/setup-python@v2 action to configure the Python environment. It specifies that Python version 3.8 should be used.
    3. Install Dependencies: This step runs the command pip install -r requirements.txt to install the project’s Python dependencies. It assumes that the project’s dependencies are listed in the requirements.txt file.
    4. Run unittests:
      • In this step, the unittest tests are executed using the command python -m unittest test.test_unittest.
      • It runs the unittest test suite defined in the test.test_unittest module.
    5. Notify on success: This step uses conditional logic with if: success() to check if all the unittest tests passed successfully. If they did, it runs the message “Unit tests passed successfully”.
    6. Notify on failure: Similarly, this step uses conditional logic with if: failure() to check if any of the unittest tests failed. If any test failed, it runs the message “Unit tests failed”.

(Q.) Why don’t we add virtual environment to our github repository or version control?

  1. Virtual Environments are platform specific(i.e., windows, linux, mac, etc).
  2. Virtual environments can be quite large, and including them in your repository would make it unnecessarily large, leading to slower clone and download times for users.
  3. Dependencies(e.g. scikit-learn, pandas, numpy, etc.) should be explicitly defined in a requirements.txt, allowing users to recreate the virtual environment as per their platform.