INSDC Submission

Open sequences will be made publicly available through the INSDC databases (ENA, DDBJ, NCBI). To accomplish this, we will upload sequences on your behalf to ENA. After successful submission to ENA (this can take up to 48 hours) we will return the accessions of your sequences in ENA.

When submitting to ENA we use the institution on your group page as your center name, center name is used by ENA as an identifier to facilitate the recognition and attribution of your sequences within the INSDC.

We urge users to not upload their sequences independently to INSDC to prevent data duplication. However, if at any point in time Pathoplexus no longer exists and you need to modify your data you can use the center name to identify your group and request sequence revision.

ENA Submission

In order to submit your sequences to ENA we need to create a Project, Sample and Assembly on your behalf, see ENA’s metadata model for more information.

In ENA, Projects contain general information on your group and the organism being sequenced. We create one Project per each group and organism. In ENA Samples contain metadata information and Assemblies contain the actual sequences. We create one sample and one assembly object per sequence.

Citing your Sequences

If you would like to cite your sequences in a publication you can use your Bioproject accession (this will start with PRJ), Biosample accession (this will start with SAM) and Genome Assembly accession (starting with GCA).

Mapping of Pathoplexus Metadata Fields to ENA Metadata Fields

To facilitate data standardization we map our metadata to ENA’s ENA virus pathogen reporting standard checklist, using PHA4GE’s official mapping.

ENA Sample-related Fields	Loculus Fields
subject exposure	exposureEvent
type exposure	exposureEvent
hospitalisation	hostHealthState==Hospital
illness symptoms	signsAndSymptoms
collection date	sampleCollectionDate
geographic location (country and/or sea)	geoLocCountry
geographic location (region and locality)	geoLocAdmin1
sample capture status	purposeOfSampling
host disease outcome	hostHealthOutcome
host common name	hostNameCommon
host age	hostAge
host health state	hostHealthState
host sex	hostGender
host scientific name	hostNameScientific
isolate	specimenCollectorSampleId
collecting institution	sequencedByOrganization, authorAffiliations
receipt date	received date
isolation source host-associated	anatomical material, anatomical part, body product
isolation source non-host-associated	environmental site, environmental material
authors	authors

ENA Assembly-related Fields	Loculus Fields
ASSEMBLY_TYPE	default=ISOLATE
PROGRAM	sequencingInstrument, default=Unknown
PLATFORM	sequencingProtocol, default=Unknown
COVERAGE	depthOfCoverage, default=1
MOLECULETYPE	NaN

Edit this page

Organisms

INSDC Submission

ENA Submission

Citing your Sequences

Mapping of Pathoplexus Metadata Fields to ENA Metadata Fields