fasta文件格式:fasta文件为一个ID对应一个序列,可以是转录本序列,蛋白序列 V350033524L1C001R001000203231 AAGCTGTCCCATCAATAGCTGCCGCTGAAGGGTGGGGCTGGATGGCGTAAGCTACAGCTGAAGGAAGAACGTGAGCACGAGGCACTGAGGTGATTGGCTG V350033524L1C001R001000694911 AAGCTGTCCAATCAATAGCTGCCGCTGAAGGGTGGGGCTGGATGGCGTAAGCTACAGCTGAAGGAAGAACGTGAGCACGAGGCACTGAGGTGATTGGCTG 读取fasta文件保存为字典,对应的ID为key,序列为value:withopen(rE:python练习文件test。fa,r)asfa:fadict{}forlineinfa:去除末尾换行符lineline。replace(,)ifline。startswith():去除号seqnameline〔1:〕fadict〔seqname〕else:去除末尾换行符并连接多行序列fadict〔seqname〕line。replace(,)查看结果,可以看到完整的把换行的序列拼接到一起了,保存为字典格式。print(fadict){V350033524L1C001R001000203231:AAGCTGTCCCATCAATAGCTGCCGCTGAAGGGTGGGGCTGGATGGCGTAAGCTACAGCTGAAGGAAGAACGTGAGCACGAGGCACTGAGGTGATTGGCTG,V350033524L1C001R001000694911:AAGCTGTCCAATCAATAGCTGCCGCTGAAGGGTGGGGCTGGATGGCGTAAGCTACAGCTGAAGGAAGAACGTGAGCACGAGGCACTGAGGTGATTGGCTG,V350033524L1C001R001000739131:AAGCTGTCCTATCAATAGCTGCCGCTGAAGGGTGGGGCTGGATGGCGTAAGCTACAGCTCAAGGAAGAACGTGAGCACGAGGCACTGAGGTGATTGGCTG,V350033524L1C001R001001493151:AAGCTGTCCTATCAATAGCTGCCGCTGAAGGGTGGGGCTGGATGGCGTAAGCTACAGCTCAAGGAAGAACGTGAGCACGAGGCACTGAGGTGATTGGCTG,V350033524L1C001R001002160481:AAGCTGTCCAATCAATAGCTGCCGCTGAAGGGTGGGGCTGGATGGCGTAAGCTACAGCTCAAGGAAGAACGTGAGCACGAGGCACTGAGGTGATTGGCTG}计算每条序列的GC含量及计算每条序列长度遍历字典forname,seqinfadict。items():输出名称print(name)计算G,C碱基数量Gbaseseq。count(G)Cbaseseq。count(C)计算含量GperGbaselen(seq)100CperCbaselen(seq)100计算长度lengthlen(seq)保留小数点后两位Gperround(Gper,2)Cperround(Cper,2)打印输出print(Gpercentis:str(Gper))print(Gpercentis:str(Cper))print(lenth:str(length))V350033524L1C001R001000203231Gpercentis:37。0Gpercentis:20。0lenth:100V350033524L1C001R001000694911Gpercentis:37。0Gpercentis:19。0lenth:100V350033524L1C001R001000739131Gpercentis:36。0Gpercentis:20。0lenth:100V350033524L1C001R001001493151Gpercentis:36。0Gpercentis:20。0lenth:100V350033524L1C001R001002160481Gpercentis:36。0Gpercentis:20。0lenth:100