Details
-
Type:
New Feature
-
Status: Closed (View Workflow)
-
Priority:
Minor
-
Resolution: Done
-
Affects Version/s: None
-
Fix Version/s: 10.0.0 Major Release
-
Labels:
-
Story Points:3
-
Epic Link:
-
Sprint:Summer 3 2023 June 12, Summer 4 2023 June 26, Summer 5 2023 July 10
Description
Situation: Under the Advanced Search tab the Search can be set to residues. This is extremely useful as a user can search for primer locations or motifs. While the Residues search does allow for wildcards (.[]*) it does not appear to understand nucleotide symbols such as R [G/A] Y [C/T] etc. So a motif found in a paper such as CACRTS does not work correctly under the Advanced Search for Residues in IGB.
Task: Expand the logic for IGB Advanced Search for Residues so that IGB can understand Nucleotide Symbols.
R A or G
Y C or T
S G or C
W A or T
K G or T
M A or C
B C or G or T
D A or G or T
H A or C or T
V A or C or G
N any base
For example, if a user were to currently use the Advanced Search for Residues in IGB to look for the motif RYSNATCG IGB would not be able to find the motif, as IGB does not understand what RYSN refers to. New logic needs to be added to IGB so that when searching, IGB understands that R can match to either A or G, Y matches C or T, etc.
Attachments
Issue Links
Activity
Field | Original Value | New Value |
---|---|---|
Epic Link | IGBF-1765 [ 17855 ] |
Priority | Major [ 3 ] | Minor [ 4 ] |
Sprint | Spring 2 2023 Jan 16 [ 162 ] | |
Labels | beginner |
Sprint | Summer 2 2023 May 29 [ 171 ] |
Assignee | Kaushik Gopu [ kgopu ] |
Sprint | Summer 2 2023 May 29 [ 171 ] | Summer 4 2023 June 26 [ 173 ] |
Sprint | Summer 4 2023 June 26 [ 173 ] | Summer 3 2023 June 12 [ 172 ] |
Description |
Situation: Under the Advanced Search tab the Search can be set to residues. This is extremely useful as a user can search for primer locations or motifs. While the Residues search does allow for wildcards (.[]*) it does not appear to understand nucleotide symbols such as R [G/A] Y [C/T] etc. So a motif found in a paper such as CACRTS does not work correctly under the Advanced Search for Residues in IGB.
Task: Expand the logic for IGB Advanced Search for Residues so that IGB can understand Nucleotide Symbols. R A or G Y C or T S G or C W A or T K G or T M A or C B C or G or T D A or G or T H A or C or T V A or C or G N any base |
Situation: Under the Advanced Search tab the Search can be set to residues. This is extremely useful as a user can search for primer locations or motifs. While the Residues search does allow for wildcards (.[]*) it does not appear to understand nucleotide symbols such as R [G/A] Y [C/T] etc. So a motif found in a paper such as CACRTS does not work correctly under the Advanced Search for Residues in IGB.
Task: Expand the logic for IGB Advanced Search for Residues so that IGB can understand Nucleotide Symbols. R A or G Y C or T S G or C W A or T K G or T M A or C B C or G or T D A or G or T H A or C or T V A or C or G N any base For example, if a user were to currently use the Advanced Search for Residues in IGB to look for the motif RYSNATCG IGB would not be able to find the motif, as IGB does not understand what RYSN refers to. New logic needs to be added to IGB so that when searching IGB understands that R can match to either A or G. |
Description |
Situation: Under the Advanced Search tab the Search can be set to residues. This is extremely useful as a user can search for primer locations or motifs. While the Residues search does allow for wildcards (.[]*) it does not appear to understand nucleotide symbols such as R [G/A] Y [C/T] etc. So a motif found in a paper such as CACRTS does not work correctly under the Advanced Search for Residues in IGB.
Task: Expand the logic for IGB Advanced Search for Residues so that IGB can understand Nucleotide Symbols. R A or G Y C or T S G or C W A or T K G or T M A or C B C or G or T D A or G or T H A or C or T V A or C or G N any base For example, if a user were to currently use the Advanced Search for Residues in IGB to look for the motif RYSNATCG IGB would not be able to find the motif, as IGB does not understand what RYSN refers to. New logic needs to be added to IGB so that when searching IGB understands that R can match to either A or G. |
Situation: Under the Advanced Search tab the Search can be set to residues. This is extremely useful as a user can search for primer locations or motifs. While the Residues search does allow for wildcards (.[]*) it does not appear to understand nucleotide symbols such as R [G/A] Y [C/T] etc. So a motif found in a paper such as CACRTS does not work correctly under the Advanced Search for Residues in IGB.
Task: Expand the logic for IGB Advanced Search for Residues so that IGB can understand Nucleotide Symbols. R A or G Y C or T S G or C W A or T K G or T M A or C B C or G or T D A or G or T H A or C or T V A or C or G N any base For example, if a user were to currently use the Advanced Search for Residues in IGB to look for the motif RYSNATCG IGB would not be able to find the motif, as IGB does not understand what RYSN refers to. New logic needs to be added to IGB so that when searching, IGB understands that R can match to either A or G. |
Description |
Situation: Under the Advanced Search tab the Search can be set to residues. This is extremely useful as a user can search for primer locations or motifs. While the Residues search does allow for wildcards (.[]*) it does not appear to understand nucleotide symbols such as R [G/A] Y [C/T] etc. So a motif found in a paper such as CACRTS does not work correctly under the Advanced Search for Residues in IGB.
Task: Expand the logic for IGB Advanced Search for Residues so that IGB can understand Nucleotide Symbols. R A or G Y C or T S G or C W A or T K G or T M A or C B C or G or T D A or G or T H A or C or T V A or C or G N any base For example, if a user were to currently use the Advanced Search for Residues in IGB to look for the motif RYSNATCG IGB would not be able to find the motif, as IGB does not understand what RYSN refers to. New logic needs to be added to IGB so that when searching, IGB understands that R can match to either A or G. |
Situation: Under the Advanced Search tab the Search can be set to residues. This is extremely useful as a user can search for primer locations or motifs. While the Residues search does allow for wildcards (.[]*) it does not appear to understand nucleotide symbols such as R [G/A] Y [C/T] etc. So a motif found in a paper such as CACRTS does not work correctly under the Advanced Search for Residues in IGB.
Task: Expand the logic for IGB Advanced Search for Residues so that IGB can understand Nucleotide Symbols. R A or G Y C or T S G or C W A or T K G or T M A or C B C or G or T D A or G or T H A or C or T V A or C or G N any base For example, if a user were to currently use the Advanced Search for Residues in IGB to look for the motif RYSNATCG IGB would not be able to find the motif, as IGB does not understand what RYSN refers to. New logic needs to be added to IGB so that when searching, IGB understands that R can match to either A or G, Y matches C or T, etc. |
Status | To-Do [ 10305 ] | In Progress [ 3 ] |
Attachment | result.png [ 17904 ] |
Sprint | Summer 3 2023 June 12 [ 172 ] | Summer 3 2023 June 12, Summer 4 2023 June 26 [ 172, 173 ] |
Rank | Ranked higher |
Attachment | pattern_error.png [ 17906 ] |
Attachment | escape_working.png [ 17907 ] |
Attachment | without_escape_character.png [ 17909 ] | |
Attachment | with_escape_character.png [ 17910 ] |
Attachment | without_escape_character.png [ 17911 ] | |
Attachment | with_escape_character.png [ 17912 ] |
Comment |
[ *How I handled "N" case:*
I have created one regex, which is ((?<![\\])(?<![Q]))[Nn]((?<![\\])(?<![E])) *Explanation of above regex:* if there is character "N" in search sequence, it basically checks whether it is surrounded by escape characters or not. if yes, no substitution else substitute with respective symbols. *Breakdown of regex:* (?<![\\]): not \ (?<![Q]) : not Q [Nn]: if there is N or n ( since it case insensitive search) ((?<![\\]): not \ (?<![E])): not E Overall, don't do anything if N is surrounded by \Q and \E else replace. we can test any regex expression using this [https://regex101.com/] online tool(switch to java 8 before testing). as of now it works fine but I want to spend some time for testing it and after that I'll be pushing changes to remote. please check the attached image for the results below. !without_escape_character.png|thumbnail! !with_escape_character.png|thumbnail! ] |
Attachment | regex.png [ 17913 ] |
Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
Assignee | Kaushik Gopu [ kgopu ] | Nowlan Freese [ nfreese ] |
Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
Status | First Level Review in Progress [ 10301 ] | Needs 1st Level Review [ 10005 ] |
Comment | [ {code}((?<![\\])(?<![Q]))[Nn]((?<![\\])(?<![E])){code} ] |
Assignee | Nowlan Freese [ nfreese ] | Kaushik Gopu [ kgopu ] |
Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
Status | First Level Review in Progress [ 10301 ] | To-Do [ 10305 ] |
Status | To-Do [ 10305 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
Assignee | Kaushik Gopu [ kgopu ] | Nowlan Freese [ nfreese ] |
Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
Assignee | Nowlan Freese [ nfreese ] | Kaushik Gopu [ kgopu ] |
Status | First Level Review in Progress [ 10301 ] | To-Do [ 10305 ] |
Sprint | Summer 3 2023 June 12, Summer 4 2023 June 26 [ 172, 173 ] | Summer 3 2023 June 12, Summer 4 2023 June 26, Summer 5 2023 July 10 [ 172, 173, 174 ] |
Rank | Ranked higher |
Status | To-Do [ 10305 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
Assignee | Kaushik Gopu [ kgopu ] | Nowlan Freese [ nfreese ] |
Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
Assignee | Nowlan Freese [ nfreese ] | Kaushik Gopu [ kgopu ] |
Status | First Level Review in Progress [ 10301 ] | Ready for Pull Request [ 10304 ] |
Status | Ready for Pull Request [ 10304 ] | Pull Request Submitted [ 10101 ] |
Assignee | Kaushik Gopu [ kgopu ] |
Status | Pull Request Submitted [ 10101 ] | Reviewing Pull Request [ 10303 ] |
Assignee | Ann Loraine [ aloraine ] |
Status | Reviewing Pull Request [ 10303 ] | To-Do [ 10305 ] |
Assignee | Ann Loraine [ aloraine ] | Kaushik Gopu [ kgopu ] |
Status | To-Do [ 10305 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Needs 1st Level Review [ 10005 ] |
Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
Status | First Level Review in Progress [ 10301 ] | Needs 1st Level Review [ 10005 ] |
Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
Status | First Level Review in Progress [ 10301 ] | Needs 1st Level Review [ 10005 ] |
Status | Needs 1st Level Review [ 10005 ] | First Level Review in Progress [ 10301 ] |
Status | First Level Review in Progress [ 10301 ] | Ready for Pull Request [ 10304 ] |
Status | Ready for Pull Request [ 10304 ] | Pull Request Submitted [ 10101 ] |
Assignee | Kaushik Gopu [ kgopu ] |
Status | Pull Request Submitted [ 10101 ] | Reviewing Pull Request [ 10303 ] |
Assignee | Ann Loraine [ aloraine ] |
Status | Reviewing Pull Request [ 10303 ] | Merged Needs Testing [ 10002 ] |
Assignee | Ann Loraine [ aloraine ] |
Status | Merged Needs Testing [ 10002 ] | Post-merge Testing In Progress [ 10003 ] |
Assignee | Nowlan Freese [ nfreese ] |
Assignee | Nowlan Freese [ nfreese ] | Kaushik Gopu [ kgopu ] |
Resolution | Done [ 10000 ] | |
Status | Post-merge Testing In Progress [ 10003 ] | Closed [ 6 ] |
Fix Version/s | 9.1.12 Major Release [ 10800 ] |
Fix Version/s | 10.0.0 [ 10900 ] | |
Fix Version/s | 9.1.12 Major Release [ 10800 ] |
The CACRTS example is from this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8318262/
The paper is examining the Arabidopsis thaliana genome.