cuffmerge
Merge RNA-seq assemblies
Syntax
Description
merges assembled transcriptome from two or more GTF files[1]. Merging GTF files is a required step to perform the downstream differential analysis withmergedGTF
= cuffmerge(gtfFiles
)cuffdiff
.
cuffmerge
requires Python®2 installed in your system.
cuffmerge
requires theCufflinks Support Package for the Bioinformatics Toolbox™. If the support package is not installed, then the function provides a downloadlink. For details, seeBioinformatics Toolbox Software Support Packages.
uses additional options specified by one or more name-value pair arguments. For example,mergedGTF
= cuffmerge(gtfFiles
,Name,Value
)cuffmerge(["Myco_1_1.transcripts.gtf","Myco_1_2.transcripts.gtf"],'NumThreads',5)
specifies to use five parallel threads.
Examples
Assemble Transcriptome and Perform Differential Expression Testing
Create aCufflinksOptions
object to define cufflinks options, such as the number of parallel threads and the output directory to store the results.
cflOpt = CufflinksOptions; cflOpt.NumThreads = 8; cflOpt.OutputDirectory ="./cufflinksOut";
The SAM files provided for this example contain aligned reads forMycoplasma pneumoniaefrom two samples with three replicates each. The reads are simulated 100bp-reads for two genes (gyrA
andgyrB
)位于相邻的基因组。所有the reads are sorted by reference position, as required bycufflinks
.
sams = ["Myco_1_1.sam","Myco_1_2.sam","Myco_1_3.sam",..."Myco_2_1.sam","Myco_2_2.sam","Myco_2_3.sam"];
Assemble the transcriptome from the aligned reads.
[gtfs,isofpkm,genes,skipped] = cufflinks(sams,cflOpt);
gtfs
is a list of GTF files that contain assembled isoforms.
Compare the assembled isoforms usingcuffcompare
.
stats = cuffcompare(gtfs);
Merge the assembled transcripts usingcuffmerge
.
mergedGTF = cuffmerge(gtfs,'OutputDirectory','./cuffMergeOutput');
mergedGTF
reports only one transcript. This is because the two genes of interest are located next to each other, andcuffmerge
cannot distinguish two distinct genes. To guidecuffmerge
, use a reference GTF (gyrAB.gtf
) containing information about these two genes. If the file is not located in the same directory that you runcuffmerge
from, you must also specify the file path.
gyrAB = which('gyrAB.gtf'); mergedGTF2 = cuffmerge(gtfs,'OutputDirectory','./cuffMergeOutput2',...'ReferenceGTF',gyrAB);
Calculate abundances (expression levels) from aligned reads for each sample.
abundances1 = cuffquant(mergedGTF2,["Myco_1_1.sam","Myco_1_2.sam","Myco_1_3.sam"],...'OutputDirectory','./cuffquantOutput1'); abundances2 = cuffquant(mergedGTF2,["Myco_2_1.sam","Myco_2_2.sam","Myco_2_3.sam"],...'OutputDirectory','./cuffquantOutput2');
Assess the significance of changes in expression for genes and transcripts between conditions by performing the differential testing usingcuffdiff
. Thecuffdiff
function operates in two distinct steps: the function first estimates abundances from aligned reads, and then performs the statistical analysis. In some cases (for example, distributing computing load across multiple workers), performing the two steps separately is desirable. After performing the first step withcuffquant
, you can then use the binary CXB output file as an input tocuffdiff
to perform statistical analysis. Becausecuffdiff
returns several files, specify the output directory is recommended.
isoformDiff = cuffdiff(mergedGTF2,[abundances1,abundances2],...'OutputDirectory','./cuffdiffOutput');
Display a table containing the differential expression test results for the two genesgyrB
andgyrA
.
readtable(isoformDiff,'FileType','text')
ans = 2×14 table test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2_fold_change_ test_stat p_value q_value significant ________________ _____________ ______ _______________________ ________ ________ ______ __________ __________ _________________ _________ _______ _______ ___________ 'TCONS_00000001' 'XLOC_000001' 'gyrB' 'NC_000912.1:2868-7340' 'q1' 'q2' 'OK' 1.0913e+05 4.2228e+05 1.9522 7.8886 5e-05 5e-05 'yes' 'TCONS_00000002' 'XLOC_000001' 'gyrA' 'NC_000912.1:2868-7340' 'q1' 'q2' 'OK' 3.5158e+05 1.1546e+05 -1.6064 -7.3811 5e-05 5e-05 'yes'
You can usecuffnorm
to generate normalized expression tables for further analyses.cuffnorm
当你有很多样品,结果是有用you want to cluster them or plot expression levels for genes that are important in your study. Note that you cannot perform differential expression analysis usingcuffnorm
.
Specify a cell array, where each element is a string vector containing file names for a single sample with replicates.
alignmentFiles = {["Myco_1_1.sam","Myco_1_2.sam","Myco_1_3.sam"],...["Myco_2_1.sam","Myco_2_2.sam","Myco_2_3.sam"]} isoformNorm = cuffnorm(mergedGTF2, alignmentFiles,...'OutputDirectory','./cuffnormOutput');
Display a table containing the normalized expression levels for each transcript.
readtable(isoformNorm,'FileType','text')
ans = 2×7 table tracking_id q1_0 q1_2 q1_1 q2_1 q2_0 q2_2 ________________ __________ __________ __________ __________ __________ __________ 'TCONS_00000001' 1.0913e+05 78628 1.2132e+05 4.3639e+05 4.2228e+05 4.2814e+05 'TCONS_00000002' 3.5158e+05 3.7458e+05 3.4238e+05 1.0483e+05 1.1546e+05 1.1105e+05
Column names starting withqhave the format:conditionX_N, indicating that the column contains values for replicateNofconditionX.
Input Arguments
gtfFiles
—Names of GTF files
string vector|cell array of character vectors
Names of GTF files, specified as a string vector or cell array of character vectors.
Example:["Myco_1_1.transcripts.gtf", "Myco_1_2.transcripts.gtf"]
Data Types:string
|cell
opt
—cuffgffread
options
CuffMergeOptions
object|string|character vector
cuffgffread
options, specified as aCuffMergeOptions
object, string, or character vector. The string or character vector must be in the originalcuffmerge
option syntax (prefixed by one or two dashes)[1].
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, whereName
is the argument name andValue
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and encloseName
in quotes.
Example:cuffmerge(["Myco_1_1.transcripts.gtf","Myco_1_2.transcripts.gtf"],'NumThreads',5)
ExtraCommand
—Additional commands
""
(default) |string|character vector
The commands must be in the native syntax (prefixed by one or two dashes). Use this option to apply undocumented flags and flags without corresponding MATLAB®properties.
Example:'ExtraCommand','--library-type fr-secondstrand'
Data Types:char
|string
IncludeAll
—Flag to apply all available options
false
(default) |true
The original (native) syntax is prefixed by one or two dashes. By default, the function converts only the specified options. If the value istrue
, the software converts all available options, with default values for unspecified options, to the original syntax.
Note
If you setIncludeAll
totrue
, the software translates all available properties, with default values for unspecified properties. The only exception is that when the default value of a property isNaN
,Inf
,[]
,''
, or""
, then the software does not translate the corresponding property.
Example:'IncludeAll',true
Data Types:logical
MinIsoformFraction
—Minimum abundance of isoform to be included in merged assembly
0.5
(default) |scalar between0
and1
Minimum abundance of an isoform to be included in the merged assembly, specified as a scalar between0
and1
. This value is expressed as a percentage of the most abundant (major) isoform.
Example:'MinIsoformFraction',0.4
Data Types:double
NumThreads
—Number of parallel threads to use
1
(default) |positive integer
Number of parallel threads to use, specified as a positive integer. Threads are run on separate processors or cores. Increasing the number of threads generally improves the runtime significantly, but increases the memory footprint.
Example:'NumThreads',4
Data Types:double
OutputDirectory
—Directory to store analysis results
current directory ("./"
)(default) |string|character vector
Directory to store analysis results, specified as a string or character vector.
Example:'OutputDirectory',"./AnalysisResults/"
Data Types:char
|string
ReferenceGTF
—Name of optional reference annotation GTF file
string|character vector
Name of an optional reference annotation GTF file to be included in the combined assembly, specified as a string or character vector.
Example:'ReferenceGTF',"ref.gtf"
Data Types:char
|string
ReferenceSequence
—Name of directory or FASTA file containing genomic sequences
string|character vector
Name of a directory or FASTA file containing genomic DNA sequences for the reference, specified as a string or character vector.
If you specify a directory, it must contain one FASTA file per contig. In other words, the directory must contain one FASTA file per reference chromosome, and each file must be named after the chromosome and have a
.fa
or.fasta
extension.If you specify a FASTA file, it must contain all the reference sequences.
The function uses the provided sequences to improve transfrag classification and exclude artifacts.
Example:'ReferenceSequence',"allrefs.fasta"
Data Types:char
|string
Output Arguments
mergedGTF
— Name of output GTF file
"./merged_asm/merged.gtf"
Name of the output GTF file containing the merged transcriptome, returned as a string.
The output string also includes the directory information defined byOutputDirectory
. By default, the function
Creates themerged_asm在当前direc子文件夹tory and saves the output file (
merged.gtf
) in that folder.Creates a subfolder namedlogsinsidemerged_asmfolder and saves a log file.
If you setOutputDirectory
to"/local/tmp/"
,mergedGTF
becomes"/local/tmp/merged.gtf"
. The function also creates thelogsfolder inside the specified output directory.
References
[1]Trapnell, Cole, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold, and Lior Pachter. “Transcript Assembly and Quantification by RNA-Seq Reveals Unannotated Transcripts and Isoform Switching during Cell Differentiation.”Nature Biotechnology28, no. 5 (May 2010): 511–15.
Version History
See Also
External Websites
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina(Español)
- Canada(English)
- United States(English)
Europe
- Belgium(English)
- Denmark(English)
- Deutschland(Deutsch)
- España(Español)
- Finland(English)
- France(Français)
- Ireland(English)
- Italia(Italiano)
- Luxembourg(English)
- Netherlands(English)
- Norway(English)
- Österreich(Deutsch)
- Portugal(English)
- Sweden(English)
- Switzerland
- United Kingdom(English)