Frequently Asked Questions (FAQ)

For questions about how to use Pathoplexus, or what certain terms mean, please see our Documentation.

For questions about our governance, please see our Governance pages.

Questions about Pathoplexus

What makes Pathoplexus different?

Pathoplexus offers flexible data-sharing options: users can choose to share their data openly, or with time-limited protections to help ensure proper attribution and credit. Pathoplexus integrates smoothly with existing INSDC-member databases (NCBI, ENA, and DDBJ), enabling data that’s ‘open’ on Pathoplexus to also appear on INSDC, and INSDC data to be accessed through Pathoplexus. Pathoplexus is built on the latest tools for filtering, searching, and accessing data, making data sharing and analysis more accessible (through both the website and API) and fostering a connected, collaborative global research community.

How can I get involved?

To become part of the scientific community that helps drive Pathoplexus, you can join PHA4GE and be part of the Data Repositories Working Group. You can also contribute code and feature suggestions to Loculus, the custom-built open-source software that powers Pathoplexus.

Who is behind Pathoplexus? Who funds it?

Pathoplexus is a transparent, non-profit association with members from 14 countries in 5 continents, and an Executive Board from around the globe. Pathoplexus’ members and Executive Board are committed to running Pathoplexus according to our Values. Pathoplexus is proud to work closely with PHA4GE, an international group working to establish better standards for public health and bioinformatics. PHA4GE was also key in helping to develop Pathoplexus from the ground up.

At the moment, Pathoplexus is run almost entirely on donations and volunteer efforts, thus is independent of influence by larger players, and is truly a community-driven project. We have received the following monetary assistance: support for one software developer working on Loculus via a swissuniversities Open Research Data grant; legal advice for our GDPR statement via PHA4GE; contribution of contracted software developers via Prof Tanja Stadler, ETH Zurich; and donation of AWS compute time via Prof Artem Babian, University of Toronto. Aside from this, Pathoplexus has been created by and is powered by thousands of hours of donated time and effort by members of the bioinformatics community (see our members and development team).

How does Pathoplexus fit in among existing pathogen sequence sharing databases?

Pathoplexus is designed to be complementary to the existing pathogen sequence database ecosystem. Data from Pathoplexus, as long as it is used and acknowledged according to the Data Use Terms, can be analyzed alongside any other dataset (if the other dataset also permits this), and always retains a link to the original source (where applicable), so that mixing data and deduplication are fully supported.

In addition, Pathoplexus is purposefully complementary to INSDC-member databases, as all data in Pathoplexus eventually goes to INSDC. For Open Data submitted to Pathoplexus, this means Pathoplexus can be used as simply another way to send data to INSDC, as it is passed on immediately. For Restricted-Use Data submitted to Pathoplexus, Pathoplexus serves as a ‘temporary protected home’ before it eventually becomes Open.

Pathoplexus sequences are annotated with cross-references to the corresponding INSDC and GISAID accessions if available.

What is Pathoplexus doing about issues around pathogen access and benefits?

While developing Pathoplexus, the issue of countries and regions sharing sequences but not receiving an equitable share of the benefit that can be derived from those sequences (e.g. vaccines), was a topic we discussed deeply. This is also a topic that’s currently under debate globally, with efforts to develop pathogen access and benefits sharing (“PABS”) agreements. After much consideration, we don’t feel a singular database is the place to try and fix this inequity - but we do want to be part of the eventual solution. This is why we commit to adhering to future consensus-driven international PABS agreements.

How can I try Pathoplexus out?

If you don’t have sequences to upload for the pathogens we currently support - or just want to try out Pathoplexus before deciding if you want to use it - you can always use our Demo Instance! Our Demo Instance works just like the ‘real’ Pathoplexus, but is wiped regularly and no data is sent onward to INSDC.

It’s perfect for trying out Pathoplexus or testing your API requests. Do note that since it is wiped regularly, you will have to make a new account and group - but it’s ok if these aren’t as detailed as your ‘real’ accounts. Remember that the Demo Instance is public, so don’t upload data that you can’t or don’t want to share. If you’d like to try out Pathoplexus but don’t have any data to hand, you can use our example data for the pathogens we support!

(See our Docs for more information on how to do things like create an account, upload sequences, and more in Pathoplexus!)

Questions about the pathogens we support

I’d like my virus of interest to be on Pathoplexus, how can I ask for it to be added?

If communities that work on a particular virus believe it would be helpful to add this to Pathoplexus, we’d love to hear about it! We’re keen to add viruses while working with those who study those viruses, so that we can ensure it’s of maximum value to the community. We may not have the resources to add additional viruses immediately, so we ask for your patience while we try to get funding to build up and support our development.

However, we’re still keen to build a list of viruses that the community is keen to see on Pathoplexus. Please search our GitHub Issues to see if anyone has already proposed the virus you’d like to suggest - if so, comment to support their proposal! If not, please create a new issue, outlining why you think it would be a great addition, and if possible, listing others in the community who support adding that virus!

How do you choose the pathogens to include?

In future, we aim to prioritize viruses that have a high public health interest and currently have a less-than-ideal sequence sharing situation, for any reason. For example, the community may not be sharing much data because of fear of ‘scooping,’ or they may find uploading the data too difficult. Alternatively, it could currently be fragmented and shared in different places, and Pathoplexus could be a way to bring it together.

We are also keen to add pathogens where there’s support in that pathogen community - where the community feels like having the pathogen on Pathoplexus will be a benefit.

Finally, we will also consider the technical difficulty of including a new virus in prioritization. For example, multi-segmented viruses require more work to ensure we’re matching up segments correctly, and some viruses may be more difficult to write robust quality-control metrics for. However, none of this rules out adding a new virus completely - it may just have to wait a bit longer until we have sufficient resources!

Questions about data

Can I use the data on Pathoplexus?

Yes! Pathoplexus is designed to be used by everyone, and so all data is accessible. However, Pathoplexus does have restrictions on how some data (“Restricted-Use Data”) can be used, particularly in publications and preprints, and has requirements on how all Restricted-Use Data is acknowledged.

You can find out more about these protections and how you can use data by reading our Data Use Terms. We also have summaries on how you can use Open Data and Restricted-Use Data.

How can I contribute data to Pathoplexus?

We’ve tried to make sharing your data as easy and flexible as possible!

You can upload data to any of our supported pathogens by first creating an account and then submitting your sequences on the website or via the API (useful for computational pipelines). At submission, you can choose whether you’d like your data to be protected for up to one year, or open immediately. Once the data is open, it also appears on INSDC-member databases.

Where does my data go when I submit it to Pathoplexus?

When you submit your data to Pathoplexus, it gets securely stored in our database, hosted on AWS in Europe (under GDPR).

If you’ve chosen for your data to be open straight away, it will be submitted to the European Nucleotide Archive (ENA). It can take up to a week for data to appear on ENA, due to processing delays on their side.

It will then be synchronised across all INSDC-member databases (i.e. GenBank and DDBJ) in a short time, and will continue to be available on Pathoplexus.

If you’ve selected the Restricted-Use data terms, your data will not be submitted to the INSDC until it becomes Open.

Should I submit my data to both INSDC-member databases (Genbank, ENA, etc) and Pathoplexus?

No, you should not submit your data to both INSDC and Pathoplexus, as it may result in your data being duplicated in both places. If you submit to INSDC, we will pull your data into Pathoplexus, so there’s no need to submit it here! If you submit to Pathoplexus, the data will go to INSDC when you specify (and immediately, if you select the data is open), so there’s no need to upload it to INSDC yourself - we’ll take care of that!

Since we keep a record of all the data we pass onto INSDC, we ensure we don’t duplicate it - but we can’t do this if users upload to both place separately!

I originally submitted my data to GISAID. Can I now submit it to Pathoplexus as well?

Yes, you can, as long as you have not shared your sequences to INSDC databases (Genbank, ENA, DDBJ). In contrast to Pathoplexus, GISAID does not submit data to the INSDC on your behalf. So unless you yourself submitted your sequences to the INSDC, you can submit them to Pathoplexus. To ensure data integrity, we encourage you to add your sequences’ GISAID Isolate ID (EPI_ISL) to the gisaidIsolateId metadata field when you submit your sequences to Pathoplexus.

How is data use restricted in Pathoplexus?

When you submit your data to Pathoplexus, you have the option to restrict how it can be used for a limited time, or make it fully open straight away. If you choose to keep your data restricted in how it can be used, it will have these protections for up to a year, giving you time to publish your research. After this period, or if you choose to share it openly immediately, your data will be released on international databases (INSDC-member databases).

If you want to use data from Pathoplexus, it’s critical you familiarize yourself with our Data Use Terms, so you know how you can use sequences and how you must acknowledge them.

Where does Pathoplexus get its data?

We get our data two ways: ingesting (‘pulling’) open data from INSDC-member databases, and the data the Pathoplexus users upload to us directly.

Pathoplexus ingests data from INSDC-member databases (specifically, from NCBI Datasets) for all the Pathogens it supports. We do this automatically at regular intervals, but always preserve the link back to the INSDC source. You can easily tell if a sequence originated from INSDC if the ‘Submitting group’ is Automated Ingest from INSDC/NCBI Virus.

Users can also submit data to us directly, which we eventually pass on to the INSDC network. All Restricted-Use data is directly submitted. You can easily tell if Open data was submitted to us directly by seeing if the ‘Submitting group’ is anyone other than Automated Ingest from INSDC/NCBI Virus.

Questions about our future and plans

I’m a funding agency or organization interested in helping Pathoplexus, who should I contact?

We are incredibly grateful for support in turning Pathoplexus into a long-term project that can help improve pathogen sequence sharing! We’re currently looking for support in all forms and sizes, and we’d love to chat with you. Please send an email to funding@pathoplexus.org, and we’ll be in touch!

What are Pathoplexus’ current priorities?

Currently we’re in our ‘Initial Phase’. We’re focusing on introducing ourselves to the community, getting feedback on Pathoplexus, and starting conversations about how Pathoplexus could help improve pathogen sequence sharing.

We’re also focused on attracting funding. Pathoplexus has been developed, and is currently running, thanks to volunteer efforts and donations of programmers and infrastructure. However, to turn Pathoplexus into something that can run for years to come, we need support in infrastructure (to run the computational side of the database), administration (to help manage and run Pathoplexus), and developers (to add features and make Pathoplexus a fully-fledged database).

Pathoplexus hopes to be a long-term player in the pathogen sequence sharing space for as long as it’s needed. However, it’s also been purposefully designed to be able to stop existing without any data being lost. If there comes a day when Pathoplexus isn’t needed (because fear around data sharing has been alleviated via other methods, or because better solutions become available), it is bound to remain in existence long enough to ensure all remaining data is uploaded to INSDC, at which point it can ‘fold gracefully,’ with no data being lost.

What are Pathoplexus’ future plans?

Once we secure funding, some of our top priorities are:

Improve our SeqSet visualization to give even more information about the sequences it includes
Add tracking of DOIs via CrossRef and link this to sequences to show their contribution to all publications
Start working with journals to add ethical sequence use to the consideration of submitted manuscripts
Perfect a system to quickly add new pathogens in crisis (or potential crisis) scenarios
Add more viral pathogens, as asked for by those pathogen communities
Expand our features, additional data we can provide to sequences, and our API
Enable federalization, allowing a ‘Pathoplexus Network’ that exchanges data on the same principles, but allows shared data ‘ownership’ and continued existence if any one ‘node’ goes down

Questions about our code

What is the code underlying Pathoplexus? What is Loculus?

Pathoplexus is an instance of the broader pathogen data sequence sharing software Loculus. This means that Pathoplexus runs on Loculus code, with specific features, personalization, and most importantly, surrounding governance, that make it ‘Pathoplexus.’ All of the Pathoplexus code is open-source and you can view it here.

Loculus was designed at the same time as Pathoplexus, but is intended to be a flexible, customizable generic pathogen sequence-sharing database. For example, a lab might use Loculus to store the samples they sequence locally and be able to easily search and access them, or a university may have a Loculus instance to gather all the sequences they generate together in one place. Alternatively, someone could create another Loculus instance to serve bacterial pathogens, much like Pathoplexus!

All of the Loculus code is also open-source, and you can view it here.

Questions about our website

Where do the images of pathogens on the front page come from?

We’re incredibly grateful to NIAID for providing fantastic images of pathogens. You can check out their incredible Flickr account to see more great images.

The images we use from NIAID are:

The images we use from the CDC are:

Marburg

We are grateful to use this West Nile Virus image from Cynthia Goldsmith at USCDCP.

We are grateful to use this HMPV image from Paul Chan.

Lastly, we are grateful to use this Mpox Virus image from the WHO.