Open sequences will be made publicly available through the INSDC databases (ENA, DDBJ, NCBI). To accomplish this, we will upload sequences on your behalf to ENA. After successful submission to ENA (this can take up to 48hours) we will return the accessions of your sequences in ENA.
When submitting to ENA we use the institution
on your group page as your center name
, center name is used by ENA as an identifier to facilitate the recognition and attribution of your sequences within the INSDC.
We urge users to not upload their sequences independently to INSDC to prevent data duplication. However, if at any point in time Pathoplexus no longer exists and you need to modify your data you can use the center name
to identify your group and request sequence revision.
In order to submit your sequences to ENA we need to create a Project, Sample and Assembly on your behalf, see ENA’s metadata model for more information.
In ENA, Projects contain general information on your group and the organism being sequenced. We create one Project per each group and organism. In ENA Samples contain metadata information and Assemblies contain the actual sequences. We create one sample and one assembly object per sequence.
If you would like to cite your sequences in a publication you can use your Bioproject accession (this will start with PRJ
), Biosample accession (this will start with SAM
) and Genome Assembly accession (starting with GCA
).
To facilitate data standardization we map our metadata to ENA’s ENA virus pathogen reporting standard checklist, using PHA4GE’s official mapping.
ENA Sample-related Fields | Loculus Fields |
---|---|
subject exposure | exposureEvent |
type exposure | exposureEvent |
hospitalisation | hostHealthState==Hospital |
illness symptoms | signsAndSymptoms |
collection date | sampleCollectionDate |
geographic location (country and/or sea) | geoLocCountry |
geographic location (region and locality) | geoLocAdmin1 |
sample capture status | purposeOfSampling |
host disease outcome | hostHealthOutcome |
host common name | hostNameCommon |
host age | hostAge |
host health state | hostHealthState |
host sex | hostGender |
host scientific name | hostNameScientific |
isolate | specimenCollectorSampleId |
collecting institution | sequencedByOrganization, authorAffiliations |
receipt date | received date |
isolation source host-associated | anatomical material, anatomical part, body product |
isolation source non-host-associated | environmental site, environmental material |
authors | authors |
ENA Assembly-related Fields | Loculus Fields |
---|---|
ASSEMBLY_TYPE | default=ISOLATE |
PROGRAM | sequencingInstrument, default=Unknown |
PLATFORM | sequencingProtocol, default=Unknown |
COVERAGE | depthOfCoverage, default=1 |
MOLECULETYPE | NaN |