Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-4062

Investigate bed and gff parsing issues with ProtAnnot

    Details

    • Type: Bug
    • Status: In Progress (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      Situation: A GFF file that has been converted to a BED file is throwing an error "WARNING!!! Lengths disagree: residues = 8043, seq = 8041.0". When uploading the converted BED file prodigal_Lambda_phage_sequences.bed and running ProtAnnot, an error occurs.

      Task: Investigate why the GFF-to-BED conversion is causing the error in ProtAnnot.

        Attachments

          Activity

          Hide
          uchinta Udaya Chinta (Inactive) added a comment - - edited

          The genomic sequence is not loaded properly because the chromosome name expected is in the format "chr1," but it is in the format "NC_000001.11." After changing the chromosome name to the "chr1" format, I was able to load the genomic sequence. However, ProtAnnot is throwing the error:
          "* WARNING!!! Lengths disagree: residues = 8043, seq = 8041.0 *"
          I analyzed the Homo sapiens data from IGB and NCBI for chromosome 21. Below are the details:
          IGB: chr21 20998408 21486285 NM_001352593.2 0 + 20998563 21484561 0 16 210,75,207,144,138,118,161,146,151,188,97,174,120,122,181,1831, 0,282169,285785,287860,293695,325974,337096,339980,375454,411865,420064,433699,468197,470253,478882,486046,

          NCBI: chr21 20998408 21486285 NM_001352593.2 . + 20998408 21486285 0 16 210,75,207,144,138,118,161,146,151,188,97,174,120,122,181,1831, 0,282169,285785,287860,293695,325974,337096,339980,375454,411865,420064,433699,468197,470253,478882,486046,

          The thick start (7th column) and thick end (8th column) values are different. This difference in values is causing the error mentioned above.
          To confirm that the difference in column values is causing the error, I changed the thick start and thick end values in the NCBI file and ran ProtAnnot. This time it did not cause an error, but the end result is not the same as the IGB annotation end result.

          NCBI annotation result:
          https://www.ebi.ac.uk/Tools/services/rest/iprscan5/result/iprscan5-R20250123-025907-0912-81242886-p1m/xml

          IGB annotation result:
          https://www.ebi.ac.uk/Tools/services/rest/iprscan5/result/iprscan5-R20250123-030209-0277-88344788-p1m/xml

          Show
          uchinta Udaya Chinta (Inactive) added a comment - - edited The genomic sequence is not loaded properly because the chromosome name expected is in the format "chr1," but it is in the format "NC_000001.11." After changing the chromosome name to the "chr1" format, I was able to load the genomic sequence. However, ProtAnnot is throwing the error: "* WARNING!!! Lengths disagree: residues = 8043, seq = 8041.0 *" I analyzed the Homo sapiens data from IGB and NCBI for chromosome 21. Below are the details: IGB: chr21 20998408 21486285 NM_001352593.2 0 + 20998563 21484561 0 16 210,75,207,144,138,118,161,146,151,188,97,174,120,122,181,1831, 0,282169,285785,287860,293695,325974,337096,339980,375454,411865,420064,433699,468197,470253,478882,486046, NCBI: chr21 20998408 21486285 NM_001352593.2 . + 20998408 21486285 0 16 210,75,207,144,138,118,161,146,151,188,97,174,120,122,181,1831, 0,282169,285785,287860,293695,325974,337096,339980,375454,411865,420064,433699,468197,470253,478882,486046, The thick start (7th column) and thick end (8th column) values are different. This difference in values is causing the error mentioned above. To confirm that the difference in column values is causing the error, I changed the thick start and thick end values in the NCBI file and ran ProtAnnot. This time it did not cause an error, but the end result is not the same as the IGB annotation end result. NCBI annotation result: https://www.ebi.ac.uk/Tools/services/rest/iprscan5/result/iprscan5-R20250123-025907-0912-81242886-p1m/xml IGB annotation result: https://www.ebi.ac.uk/Tools/services/rest/iprscan5/result/iprscan5-R20250123-030209-0277-88344788-p1m/xml
          Hide
          uchinta Udaya Chinta (Inactive) added a comment -

          Upon investigation, we discovered that the GFF to BED conversion in IGB is not working properly, which is preventing the GFF file conversion to BED from running correctly in ProtAnnot.

          For now, Moving the ticket to backlog.

          Show
          uchinta Udaya Chinta (Inactive) added a comment - Upon investigation, we discovered that the GFF to BED conversion in IGB is not working properly, which is preventing the GFF file conversion to BED from running correctly in ProtAnnot. For now, Moving the ticket to backlog.

            People

            • Assignee:
              uchinta Udaya Chinta (Inactive)
              Reporter:
              nfreese Nowlan Freese
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: