Everything you need to know to share transcriptomics data cleanly, securely, and in a format your collaborators can actually use.
RNA sequencing has become one of the most widely used techniques in molecular biology, enabling researchers to profile gene expression across entire transcriptomes. But the datasets it produces are large, complex, and contextually sensitive in ways that make sharing genuinely challenging.
A typical bulk RNA-Seq experiment with 12 samples might generate 50–150 GB of raw FASTQ data. A single-cell RNA-Seq (scRNA-Seq) experiment with tens of thousands of cells can produce terabytes of data across multiple file types. When this data needs to move from a sequencing core to an analysis lab, from one institution to a collaborator, or from a lab to a public repository, the transfer must preserve not just the files but the metadata required to make sense of them.
Beyond scale, RNA-Seq data from human subjects carries privacy implications. Transcriptome profiles can reveal information about disease state, immune status, and — for studies involving human primary cells — potentially re-identifiable genomic variants present in the RNA-Seq reads. This adds a security dimension to what might otherwise seem like a routine data handoff.
| File Type | Description | Typical Size |
|---|---|---|
.fastq.gz | Raw reads, compressed. Primary output from sequencer. | 5–30 GB / sample |
.bam | Aligned reads. Output from STAR, HISAT2, etc. | 3–15 GB / sample |
.bai | BAM index file. Required alongside .bam. | <10 MB |
.tsv / .csv | Count matrices. Output from featureCounts, HTSeq, etc. | 1–100 MB |
.h5ad / .loom | Single-cell expression matrices (AnnData, Loom formats). | 100 MB – 10 GB |
.rds | R data objects. Seurat objects, DESeq2 results, etc. | 100 MB – 5 GB |
.gtf / .gff | Genome annotation files used for alignment and quantification. | 50–300 MB |
When transferring data to a collaborator, always consider which of these files they actually need. Raw FASTQ files give maximum flexibility but require significant compute for re-processing. Count matrices or R objects are far smaller and immediately usable if the collaborator trusts your pre-processing pipeline. Communicate clearly which pipeline versions and reference genome builds were used.
Many failed collaborations trace back not to the data itself but to incomplete metadata. A count matrix without a sample sheet is nearly useless. A FASTQ file without information about the library preparation kit, read length, or strandedness may require guesswork that introduces errors in downstream analysis.
A well-documented README file included in the transfer bundle can save a collaborator days of back-and-forth. BioTransfer's batch transfer feature preserves folder structure, so you can organise your transfer as a project directory with subdirectories for raw data, processed data, and metadata.
RNA-Seq analysis pipelines are sensitive to data corruption. A single corrupted byte in a FASTQ file can cause a STAR alignment to fail silently or produce subtly incorrect outputs. Unlike obvious crashes, silent corruption is especially dangerous because it may only become apparent after weeks of downstream analysis.
Always generate and share MD5 checksums for every file in your transfer. The standard workflow:
md5sum *.fastq.gz > checksums.md5checksums.md5 in the transfer bundlemd5sum -c checksums.md5BioTransfer automatically computes an MD5 checksum of each uploaded file and stores it with the transfer record, giving both sender and recipient a verifiable integrity reference without manual checksum generation.
Not all RNA-Seq data requires end-to-end encryption. Here is a practical decision framework:
Many funding agencies (NIH, Wellcome Trust, ERC) and journals now require raw RNA-Seq data to be deposited in a public repository upon publication. The primary repositories are:
BioTransfer is designed for researcher-to-researcher collaboration during the active phase of a project — before public deposition, when data is still being processed and shared with co-investigators. For final public archiving, use the repositories above. For the working transfers that happen throughout a project — sharing raw data with a bioinformatics core, sending processed results to a collaborating PI, distributing a Seurat object to a co-first author — BioTransfer provides the speed, security, and simplicity that institutional FTP and consumer cloud drives cannot.
When sending RNA-Seq data to a collaborator, structure your transfer as a clear project directory. A recommended layout:
raw_data/ — FASTQ files, one sub-folder per samplealigned/ — BAM files and .bai indicescounts/ — Count matrices (featureCounts output, etc.)qc/ — MultiQC HTML report, individual FastQC outputsmetadata/ — Sample sheet (CSV), experimental design notesREADME.txt — Pipeline versions, reference genome, key parameters, checksumsBioTransfer's folder transfer feature preserves this directory structure end-to-end. The recipient downloads a ZIP that reconstructs the exact folder hierarchy — no manual reorganisation required.
Folder structure preserved. Integrity verified. Encrypted when you need it.
Start a Transfer