Details
-
Type: New Feature
-
Status: Closed (View Workflow)
-
Priority: Major
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Story Points:1
-
Epic Link:
-
Sprint:Fall 7 Dec 14 - Dec 23, Winter 1 Dec 28 - Jan 8, Winter 2 Jan 11 - Jan 22, Winter 3 Jan 25 - Feb 5, Winter 4 Feb 8 - Feb 19, Winter 5 Feb 22 - Mar 5, Winter 6 Mar 8 - Mar 19, Spring 1 2021 Mar 22 - Apr 2, Spring 2 2021 Apr 5 - Apr 16
Description
The CGI script "geneIdLookup.py" in cgi-bin of the bioviz code base connects to a small dynamodb table during operation of bar.html code.
Add provisioning and configuration of the dynamodb resource to the bioviz playbooks.
Attachments
Activity
A table named "Araport11" needs to be created and populated. FYI: It will likely never change and it is very small.
Code used to create the table: https://bitbucket.org/lorainelab/genomesource/src/master/IGBF-1495/
Data used to populate the table: http://igbquickload.org/quickload/A_thaliana_Jun_2009/Araport11.bed.gz
Data file is also version-controlled in a subversion repo: https://svn.bioviz.org/viewvc/genomes/quickload/
To test whether the dynamoDB access is working, just hit this URL, substituting your bioviz hostname:
[~aloraine] This is the command to create a json of all the content of dynamo db
aws dynamodb scan --table-name TABLE_NAME > export.json --region us-east-1
The above command has to be run from an ec2 server with a role to download the content of dynamo DB
Please share that file with me.
Data are available (temporarily) at the following URL:
CLI doesnt allow more than 25 items to be loaded into the table
I have started using the .py scripts mentioned earlier to try and integrate that with the playbooks
Thanks for the update!
Process
1. Grant dynamo DB admin role priv to ec2 to provision infrastructure
2. Create dynamo db table if not present
3. Copy data and python script to EC2
4. Load data using python script if the data is not present
5. Remove admin from ec2 role and grant read-only access level to ec2
Please review: https://bitbucket.org/chesterdias/appstore-playbooks/branch/IGBF-2686#diff
Chester Dias - Please confirm that the playbook "setup.yml" can run from start to finish in a non-lorainelab account.
Please also check that the site works, as well – main one to check is that the bar.js and bar.html functionality is working properly, since these are the parts of the site that are using the dynamodb.
To test, please team up with another developer for a zoom session. Working together, walk through the key parts of the site to ensure they are working correctly.
To know how to test bar.hmtl and bar.js, read the relevant sections of this paper which describes the functionality: https://pubmed.ncbi.nlm.nih.gov/31350781/
To test with the efp-browser mentioned in the paper, copy the urls for "view in IGB" show in the efp-browser. Enter these links into a web browser, replacing them with the address of your bioviz mirror site. You may also need to modify your local /etc/hosts file to "trick" your browser into thinking that the your bioviz mirror site's "hostname" resolves to its public IP address. (This may be necessary if your bioviz mirror's hostname is not current register in DNS.)
[~aloraine] I have noticed the data load takes more than 12 minutes with the whole data using the py script. Would you like me to see if we can improve that script? It gives a feeling that the ansible session has got stuck.
Sure, I would say that counts as a bug that ought to be addressed. The table itself is tiny, so it does not make sense that it takes so long.
I was able to get the configuration modified to increase the write speed. [~aloraine] Could you please let me know how often we would be writing to this table. There appears to be a monthly charge associated to the write capacity units. If we are not using we can dial down that value after populating all the data
Speed of Read/ write is controlled by below
Provisioned read capacity units 5 (Auto Scaling Disabled)
Provisioned write capacity units 50 (Auto Scaling Disabled)
Overall Speed as per specification of aws is that A single call to BatchWriteItem can write up to 16 MB of data, which can comprise as many as 25 put or delete requests.
Reference: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html
[~aloraine] Could you please review the write capacity units of https://console.aws.amazon.com/dynamodb/home?region=us-east-1#tables:selected=Araport11;tab=capacity
it is set to 20(10$ a month charge) at the moment but if we are not using writes then we can dial this down to 5 which is the default.
Thank you. I have set the write capacity to 1. Is that OK?
Should be fine to set it to 5. As long as we are not writing. There could be some internal meta writes
Chirag Chandrahas Shetty Please try running this from your system as a fresh run E2E, as Dr. Loraine suggested earlier before the holiday break. I would get an idea then of what more needs to be done to simplify this and improve where we can, based on your execution difficulties. I will remain available during the next week so message me any time about any issues you come across.
Requests:
- Please modify the loading strategy to use the original BED file and not a JSON file to load the data; please see scripts written by Charan V. (https://bitbucket.org/lorainelab/genomesource/src/master/IGBF-1495/)
- Use an ansible svn module to obtain the BED file by "checking out" a read-only single directory where the bed file resides. Do NOT check out the entire repository as it is huge.
- To support the above, add new default configurations to (a) specify the specific svn directory to be checked-out (b) specify the checked out bed file to be used (Araport11.bed.gz). Use the current convention of specifying defaults by adding these to the roles "defaults" directory (for example, see https://bitbucket.org/lorainelab/bioviz-playbooks/src/master/roles/clone/defaults/main.yml)
Looks like the new code is referring to the appstore playbooks. The appstore actually does not need any dynamodb code. Please get the latest copy master branch of the bioviz-playbooks repository and transfer the new code over to there.
I'm in the middle of restructuring the bioviz-playbooks and will push my new code later today or Monday. Wait for that to happen before creating your new bioviz-playbooks branch.
question about:
if not request:
+ request.append({
+ 'Chromosome': Chromosome,
+ 'Start': Start,
+ 'End': End,
+ 'GeneId': GeneId,
+ })
That came from https://bitbucket.org/lorainelab/genomesource/src/master/IGBF-1495/createTable.py
if not data:
data.append({
'Chromosome': Chromosome,
'Start': Start,
'End': End,
'GeneId': GeneId,
})
My understanding is that it was possibly done to prevent adding duplicate entries with line 28 to 53 in my code since for execution can not occur in when the request is empty
Please investigate: What happens if "data" (or "request") is "None"?
data is declared as an empty array on line 9. Since there is no first element in `data` for the first execution, running a for loop will throw an error. The IF part of the logic is only to handle first element. All the remaining executions will go through the else part of the logic running the for loop.
OK I understand. Please modify the "if" statement to literally test the size of the array instead of relying on a quirk of the language in which an empty array evaluates to False in "if" statements.
Merged.
Making some changes / updates to the code:
- Define dynamo_db_table_name to be Araport11
- Add role to aws.yml
The loading script appears to search through the entire "results" list each time it reads a line. From start to finish.
Fixing this.
Also will enable AWS and other variables to be passed as command line options. This will enable the script to be run with no editing required.
How to get a single file out of a subversion repository without checking out an entire directory:
svn export https://svn.bioviz.org/repos/genomes/quickload/A_thaliana_Jun_2009/Araport11.bed.gz --username=guest --password=guest
A role "Bioviz" needs to be created that has AmazonDynamoDBReadOnlyAccess policy.