Tuesday, August 23, 2016

samtools view bam from tophat

Instead of :

samtools view -h $f "chr15:51500254-51630795"


samtools view -h $f "15:51500254-51630795"





Full parameter description for every track:


Monday, August 22, 2016

bam error

[bam_header_read] EOF marker is absent. The input is probably truncated.

[bam_header_read] invalid BAM binary header (this is not a BAM file).

Cufflinks outputs these warnings. But Cufflinks still generates isoform results.

Genome Data Commons -- mapping between file id and cases sample submitter id (TCGA)

Frankly, the GDC now is designed for the persons with efficient coding experience.

After I download the RNA-seq bam files and did my analysis, I would like to map the file ids to sample ids (normally start with TCGA).

I spent several days on this step and eventually figured out the information has to be achieved from the command line:

curl 'https://gdc-api.nci.nih.gov/files/8e793c37-de14-40ef-abaf-a73b833d2a68?pretty=true&fields=cases.samples.submitter_id,file_id'

  "data": {
    "cases": [
        "samples": [
            "submitter_id": "TCGA-DB-A64P-01A"
    "file_id": "8e793c37-de14-40ef-abaf-a73b833d2a68"

  "warnings": {}

Unfortunately, the above code is not listed in the GDC help page: https://gdc-docs.nci.nih.gov/API/Users_Guide/Search_and_Retrieval/

Please see the page as well if you need to map specific fields: https://gdc-docs.nci.nih.gov/API/Users_Guide/Appendix_A_Available_Fields/
