Auto downloading from GitHub¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
from tf.applib.repo import checkoutRepo checkoutRepo( org='annotation' repo='tutorials', folder='text-fabric/examples/banks/tf', version='', checkout='', source=None, dest=None, withPaths=True, keep=True, silent=False, label='data', )
Maintain a local copy of a subfolder folder in GitHub repository repo of org. The copy may be taken from any point in the commit history of the online repo.
If you call this function, it will check whether the requested data is already on your computer in the expected location. If not, it may check whether the data is online and if so, download it to the expected location.
The result of a call to checkoutRepo() is a tuple:
(commitOffline, releaseOffline, kindLocal, localBase, localDir)
Here is the meaning:
- commitOffline is the commit hash of the data you have offline afterwards
- releaseOffline is the release tag of the data you have offline afterwards
- kindLocal indicates whether an online check has been performed: it is
Noneif there has been an online check. Otherwise it is
cloneif the data is in your
~/githubdirectory else it is
- localBase where the data is under:
~/text-fabric-data, or whatever you have passed as source and dest, see below.
- localDir releative path from localBase to your data. If your data has versions, localDir points to directory that has the versions, not to a specific version.
Your local copy can be found under your
~/text-fabric-data directory using a relative path org/repo/folder if there is a version, else org/repo/folder/version.
checkout, source and dest
The checkout parameter determines from which point in the history the copy will be taken and where it will be placed. That will be either your
~/github or your
You can override the hard-coded
~/text-fabric-data directories by passing source and dest respectively.
See the repo notebook for an exhaustive demo of all the checkout options.
withPaths=False will loose the directory structure of files that are being downloaded.
keep=False will destroy the destination directory before a download takes place.
silent=True will suppress non-error messages.
*label='something' will change the word "data" in log messages to what you choose. We use
label='TF-app' when we use this function to checkout the code of a TF-app.
checkRepo() function uses the GitHub API. GitHub has a rate limiting policy for its API of max 60 calls per hour.
If you use this function in an application of yours that uses it very often, you can increase the limits by making yourself known.
- Read more about rate limiting on Github
- register your app with GitHub
- Obtain your client-id and client-secret and put them in environment variables named
GHSECRETon the system where your app runs.
checkRepo()finds these variables, it will add the credentials to every GitHub API call it makes, and that will increase the rate.
- Never pass your personal credentials on to your clients!