Blog Posts

Using Github to manage Experiment Versioning

4/14/2020

I've been playing with github recently. I thought it could be useful in handling experiment manipulations. I made a master branch for an experiment, then I have branches for each of the experimental manipulations. So far it seems pretty effective. It's a web experiment, and so when I want to collect data in a particular manipulation, I do a 'git checkout manipulation' call, and all of the files are updated appropriately.

It sends the data to the same database, so I have branch IDs associated with each experimental manipulation, stored in the same tables. Also, I modify the readme per branch to track what the particular manipulation is.
I'm kinda proud of this as a functional implementation for managing experiment variation.

I've been keeping notes on best practices in the github repo readme. Unfortunately, I can't make the repo public as I accidentally included login info in a php file (didn't follow the best practice of employing the .gitignore file correctly smh). As it is, I'll just post my notes here. Maybe at some point I'll copy the repo without the sensitive info as a demonstration.

## General Operations

Master branch serves as main feature hub. Branches serve as experimental variations. Experimental variations include: distributions

Each branch must include a 1) branch ID, which is included in the branch readme, and included in index.html to be saved with user data. 2) a thorough description of the experimental condition variations in the given branch.

The readme for each branch indicates parameters specific to that branch only. Central modifications to the readme exist only on master branch.

## Experiment branch parameters

Branch ID:

Number of Cities:

Distribution:

Training Repititions:

Number of Blocks:

Sentence recognition questions per block:

Sentences per block:

Statement Sampling:

Other modifications:

## General notes on using github for experiment development

### version management of experiments with github
- local versions on computer
- hosted on remote server
- need to manage multiple variations of experiment

### solution:
use git branching for each version of experiment

### must:
incorporate documentation for each branch indicating version
have root where core functionality is maintained
versions are variations of master

### cannot:
merge branch to master - the point is to have each manipulation of experiment well defined as branch

### steps:
- created root experiment git repo (from local, github desktop)
- when creating git repo that requires server access, use .gitignore to prevent copying sensitive info (passwords) to github
- identical file structures for local and server (exists outside individual repo)
- updated support repos (have a js plugin repo sitting, now it's active part of system)
- copied repos to server
- 'git clone https...'
- had to change the user so that I got the credit instead of Rob
- 'git config --list' shows configuration for git repo
- 'git config user.name 'my_username' - updates git username associated with repo mods
- 'git config user.email '[email protected]' - updates git email address - necessary to link account
- Temporarily saving github credentials in cache
- 'git config credential.helper cache' - followed by pull/push command, username and password are saved in cache for default 900 seconds
- 'git credential-cache exit' - manually clears cache, probably good practice prior to logout

### Useful Git commands
- Create a branch
- I use Github desktop to manage this steps
- Switch between branches
- git checkout [branch]
- View all branches
- git branch

### Best practices for using git to manage experimental variations

- Use the '.gitignore' file
- Do not upload files with usernames and passwords to github
- Log any files or folders with server access information into the .gitignore file to prevent accidental data insecurity

- Use the master branch for the main code. For experimental variations, create a new branch.
- Each branch must work independently, and any data logged must be tagged according to the variation.
- In this experiment, I used multiple tables to track data. I shouldn't have. Instead, it would probably be better to track all data from an experiment in the same SQL table.
- This includes any variation for the expriment. Not every entry will fill every column, but it will lead to more concise organization in the long run.

Blog

Author

Archives

Categories