Dataset Viewer
Auto-converted to Parquet Duplicate
file_id
stringlengths
9
98
repo_id
stringclasses
1 value
source_sha
stringclasses
1 value
dataset_id
stringclasses
1 value
source_family
stringclasses
52 values
source_slug
stringclasses
53 values
source_file
stringclasses
53 values
path
stringlengths
9
98
role
stringclasses
6 values
shard_index
int64
-1
717
part_index
int64
-1
17
size_bytes
int64
467
193B
compression
stringclasses
4 values
logical_table_size_bytes
int64
-1
710B
split_part_count
int64
-1
18
split_chunk_bytes
int64
-1
40B
sequence_source_shard_count
int64
-1
717
sequence_source_bytes
int64
-1
75.1B
repo_file_count
int64
3.23k
3.23k
repo_total_bytes
int64
3,156B
3,156B
sequence_shard_count_total
int64
3.15k
3.15k
sequence_shard_bytes_total
int64
343B
343B
table_repo_file_count_total
int64
76
76
table_repo_bytes_total
int64
2,813B
2,813B
logical_table_count_total
int64
28
28
logical_table_bytes_total
int64
2,381B
2,381B
is_sequence_shard
bool
2 classes
is_table_file
bool
2 classes
is_split_part
bool
2 classes
is_split_manifest
bool
2 classes
is_original_table_copy
bool
2 classes
download_pattern
stringclasses
57 values
access_note
stringclasses
1 value
split_bucket
int64
1
9
.gitattributes
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
.gitattributes
git_attributes
-1
-1
11,094
-1
-1
-1
-1
-1
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
false
false
false
false
false
.gitattributes
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
5
README.md
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
README.md
readme
-1
-1
3,423
-1
-1
-1
-1
-1
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
false
false
false
false
false
README.md
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
5
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000001.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000001.fasta.zst
sequence_shard
1
-1
112,847,416
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
3
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000002.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000002.fasta.zst
sequence_shard
2
-1
112,597,710
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
2
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000004.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000004.fasta.zst
sequence_shard
4
-1
112,466,112
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
9
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000005.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000005.fasta.zst
sequence_shard
5
-1
112,346,737
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
2
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000006.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000006.fasta.zst
sequence_shard
6
-1
112,127,898
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
9
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000007.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000007.fasta.zst
sequence_shard
7
-1
112,078,169
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
6
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000008.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000008.fasta.zst
sequence_shard
8
-1
111,995,344
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
1
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000009.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000009.fasta.zst
sequence_shard
9
-1
112,020,908
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
5
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000010.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000010.fasta.zst
sequence_shard
10
-1
111,740,881
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000012.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000012.fasta.zst
sequence_shard
12
-1
111,608,505
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
4
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000013.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000013.fasta.zst
sequence_shard
13
-1
111,295,338
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
6
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000015.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000015.fasta.zst
sequence_shard
15
-1
111,354,421
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
7
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000016.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000016.fasta.zst
sequence_shard
16
-1
111,247,080
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
6
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000017.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000017.fasta.zst
sequence_shard
17
-1
111,154,975
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
6
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000018.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000018.fasta.zst
sequence_shard
18
-1
110,986,891
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
7
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000019.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000019.fasta.zst
sequence_shard
19
-1
110,917,178
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
1
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000020.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000020.fasta.zst
sequence_shard
20
-1
110,880,119
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
4
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000021.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000021.fasta.zst
sequence_shard
21
-1
110,865,110
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
2
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000022.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000022.fasta.zst
sequence_shard
22
-1
110,820,296
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
6
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000023.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000023.fasta.zst
sequence_shard
23
-1
110,843,188
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
6
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000024.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000024.fasta.zst
sequence_shard
24
-1
110,788,258
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000025.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000025.fasta.zst
sequence_shard
25
-1
110,574,052
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
9
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000026.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000026.fasta.zst
sequence_shard
26
-1
110,606,645
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
4
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000027.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000027.fasta.zst
sequence_shard
27
-1
110,455,480
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
5
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000028.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000028.fasta.zst
sequence_shard
28
-1
110,490,939
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
4
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000029.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000029.fasta.zst
sequence_shard
29
-1
110,258,091
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
4
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000030.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000030.fasta.zst
sequence_shard
30
-1
110,338,890
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
4
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000032.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000032.fasta.zst
sequence_shard
32
-1
110,010,905
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
5
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000033.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000033.fasta.zst
sequence_shard
33
-1
110,062,735
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
6
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000034.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000034.fasta.zst
sequence_shard
34
-1
109,943,512
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
1
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000035.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000035.fasta.zst
sequence_shard
35
-1
109,991,842
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
6
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000036.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000036.fasta.zst
sequence_shard
36
-1
109,998,114
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
9
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000038.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000038.fasta.zst
sequence_shard
38
-1
110,027,656
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
3
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000039.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000039.fasta.zst
sequence_shard
39
-1
109,502,676
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
2
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000040.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000040.fasta.zst
sequence_shard
40
-1
109,599,612
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
9
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000041.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000041.fasta.zst
sequence_shard
41
-1
109,727,659
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
3
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000043.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000043.fasta.zst
sequence_shard
43
-1
109,783,423
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
1
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000044.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000044.fasta.zst
sequence_shard
44
-1
109,542,572
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
7
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000045.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000045.fasta.zst
sequence_shard
45
-1
109,549,995
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
2
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000046.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000046.fasta.zst
sequence_shard
46
-1
109,471,041
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
9
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000047.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000047.fasta.zst
sequence_shard
47
-1
109,448,141
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
1
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000048.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000048.fasta.zst
sequence_shard
48
-1
109,333,207
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000049.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000049.fasta.zst
sequence_shard
49
-1
109,264,072
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
4
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000050.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000050.fasta.zst
sequence_shard
50
-1
109,245,953
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
5
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000051.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000051.fasta.zst
sequence_shard
51
-1
109,000,183
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
1
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000052.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000052.fasta.zst
sequence_shard
52
-1
109,019,444
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
4
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000053.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000053.fasta.zst
sequence_shard
53
-1
109,103,669
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
9
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000054.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000054.fasta.zst
sequence_shard
54
-1
109,042,168
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
3
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000056.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000056.fasta.zst
sequence_shard
56
-1
109,189,652
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000057.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000057.fasta.zst
sequence_shard
57
-1
108,942,940
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
7
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000058.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000058.fasta.zst
sequence_shard
58
-1
108,797,171
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
1
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000059.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000059.fasta.zst
sequence_shard
59
-1
108,865,193
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
6
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000060.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000060.fasta.zst
sequence_shard
60
-1
108,953,346
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
1
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000061.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000061.fasta.zst
sequence_shard
61
-1
108,682,222
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
2
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000062.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000062.fasta.zst
sequence_shard
62
-1
108,731,780
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000063.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000063.fasta.zst
sequence_shard
63
-1
108,778,771
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
9
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000064.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000064.fasta.zst
sequence_shard
64
-1
108,641,623
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
3
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000065.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000065.fasta.zst
sequence_shard
65
-1
108,734,639
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
1
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000066.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000066.fasta.zst
sequence_shard
66
-1
108,561,391
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
7
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000067.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000067.fasta.zst
sequence_shard
67
-1
108,544,225
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
6
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000068.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000068.fasta.zst
sequence_shard
68
-1
108,476,443
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
5
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000069.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000069.fasta.zst
sequence_shard
69
-1
108,499,073
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
6
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000070.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000070.fasta.zst
sequence_shard
70
-1
108,531,007
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
9
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000071.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000071.fasta.zst
sequence_shard
71
-1
108,531,725
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
3
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000072.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000072.fasta.zst
sequence_shard
72
-1
108,335,390
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
6
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000073.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000073.fasta.zst
sequence_shard
73
-1
108,382,418
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
3
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000074.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000074.fasta.zst
sequence_shard
74
-1
108,344,231
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
7
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000075.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000075.fasta.zst
sequence_shard
75
-1
108,305,652
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
4
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000076.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000076.fasta.zst
sequence_shard
76
-1
108,120,730
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
7
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000077.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000077.fasta.zst
sequence_shard
77
-1
108,139,588
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
3
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000078.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000078.fasta.zst
sequence_shard
78
-1
108,184,095
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000079.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000079.fasta.zst
sequence_shard
79
-1
108,183,109
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
5
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000080.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000080.fasta.zst
sequence_shard
80
-1
108,097,404
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
2
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000081.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000081.fasta.zst
sequence_shard
81
-1
108,037,216
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000082.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000082.fasta.zst
sequence_shard
82
-1
107,965,504
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
1
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000083.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000083.fasta.zst
sequence_shard
83
-1
107,991,682
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
3
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000084.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000084.fasta.zst
sequence_shard
84
-1
108,052,505
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000085.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000085.fasta.zst
sequence_shard
85
-1
107,819,306
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
9
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000086.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000086.fasta.zst
sequence_shard
86
-1
107,971,579
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000088.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000088.fasta.zst
sequence_shard
88
-1
107,759,980
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
6
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000089.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000089.fasta.zst
sequence_shard
89
-1
107,848,919
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
7
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000090.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000090.fasta.zst
sequence_shard
90
-1
107,803,148
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
2
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000091.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000091.fasta.zst
sequence_shard
91
-1
107,761,043
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
1
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000092.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000092.fasta.zst
sequence_shard
92
-1
107,772,540
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
5
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000093.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000093.fasta.zst
sequence_shard
93
-1
107,746,292
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
3
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000094.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000094.fasta.zst
sequence_shard
94
-1
107,591,932
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
4
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000095.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000095.fasta.zst
sequence_shard
95
-1
107,666,017
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
7
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000096.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000096.fasta.zst
sequence_shard
96
-1
107,747,681
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
4
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000097.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000097.fasta.zst
sequence_shard
97
-1
107,530,531
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
2
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000098.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000098.fasta.zst
sequence_shard
98
-1
107,575,005
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000099.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000099.fasta.zst
sequence_shard
99
-1
107,589,327
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000100.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000100.fasta.zst
sequence_shard
100
-1
107,453,507
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000101.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000101.fasta.zst
sequence_shard
101
-1
107,689,722
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000102.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000102.fasta.zst
sequence_shard
102
-1
107,411,332
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
3
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000103.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000103.fasta.zst
sequence_shard
103
-1
107,678,748
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
8
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000104.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000104.fasta.zst
sequence_shard
104
-1
107,351,865
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
1
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000105.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000105.fasta.zst
sequence_shard
105
-1
107,640,395
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
3
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000107.fasta.zst
LiteFold/Mgnify
4823aa306c32be5800a6ad6a1a6afd256bece606
mgnify_proteins
mgy_clusters
sequence_mgnify_current_release_mgy_clusters.fa.gz
sequence/mgnify/current_release/mgy_clusters.fa.gz
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-000107.fasta.zst
sequence_shard
107
-1
107,385,924
zstd
-1
-1
-1
717
75,096,691,452
3,226
3,156,136,546,084
3,148
342,815,994,580
76
2,813,320,536,987
28
2,380,980,172,257
true
false
false
false
false
sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst
Default config indexes Mgnify files. Stream raw FASTA/table payloads from sequences/ and tables/ with huggingface_hub.
7
End of preview. Expand in Data Studio

MGnify Protein Catalogues

This repository contains the LiteFold/Mgnify MGnify protein catalogue payload plus a compact default index for the Hugging Face Dataset Viewer.

The raw sequence and table payload is TB-scale, so the default config is intentionally a file/shard index rather than a duplicate of every raw row. The raw files remain in sequences/ and tables/; use the index to discover sources, shards, part files, sizes, and download patterns, then stream or download only the payload files you need.

Dataset Summary

Metric Value
Default index rows 3,226
Default index columns 34
Repository files indexed 3,226
Repository bytes indexed 3,156,136,546,084
Sequence source files 26
Sequence shards 3,148
Sequence shard bytes 342,815,994,580
Logical table sources 28
Logical table bytes 2,380,980,172,257
Table files in repo 76
Table repo bytes 2,813,320,536,987

The table repo bytes include both top-level table JSONL files and split part files where both are present. The logical table bytes count each upstream table once.

Default Splits

The default index split is deterministic by file id:

sha256(file_id) % 10

Bucket 0 is test; buckets 1 through 9 are train.

Split Rows
train 2,902
test 324

These are file-index splits, not biological train/test sequence splits. For model training, create sequence-level or cluster-level splits appropriate to your task after loading the relevant MGnify payload.

Loading With datasets

Load the default file/shard index:

from datasets import load_dataset

index = load_dataset("LiteFold/Mgnify")
print(index)
print(index["train"][0])

Load one split directly:

from datasets import load_dataset

train_index = load_dataset("LiteFold/Mgnify", split="train")

Find sequence shards for one source family:

from datasets import load_dataset

index = load_dataset("LiteFold/Mgnify", split="train")
mgy_clusters = index.filter(
    lambda row: row["role"] == "sequence_shard"
    and row["source_family"] == "mgy_clusters"
)
print(mgy_clusters[0]["download_pattern"])

Find split table parts:

from datasets import load_dataset

index = load_dataset("LiteFold/Mgnify", split="train")
parts = index.filter(lambda row: row["role"] == "table_split_part")
print(parts[0]["path"], parts[0]["size_bytes"])

Streaming Raw FASTA Shards

Download one source family with the Hub client:

from huggingface_hub import snapshot_download

local_dir = snapshot_download(
    repo_id="LiteFold/Mgnify",
    repo_type="dataset",
    allow_patterns=[
        "sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/shard-*.fasta.zst"
    ],
)
print(local_dir)

Stream a shard without downloading the whole source family:

from huggingface_hub import HfFileSystem
import zstandard as zstd

fs = HfFileSystem()
path = (
    "datasets/LiteFold/Mgnify/"
    "sequences/sequence_mgnify_current_release_mgy_clusters.fa.gz/"
    "shard-000001.fasta.zst"
)

dctx = zstd.ZstdDecompressor()
with fs.open(path, "rb") as f, dctx.stream_reader(f) as reader:
    chunk = reader.read(1 << 20)
    print(chunk[:200])

Downloading Raw Table Parts

For split tables, use the download_pattern column or a direct include pattern:

hf download LiteFold/Mgnify --repo-type dataset \
  --include 'tables/sequence_mgnify_current_release_mgy_proteins_pfam.tsv.gz.jsonl.parts/part-*.jsonl' \
  --local-dir ./mgnify

For unsplit tables:

hf download LiteFold/Mgnify --repo-type dataset \
  --include 'tables/sequence_mgnify_current_release_mgy_seq_metadata_2.tsv.gz.jsonl' \
  --local-dir ./mgnify

The raw table files are not registered as datasets configs because they are multi-TB nested JSONL payloads. Keeping the default config as a compact Parquet index prevents accidental full-repo scans and keeps the Dataset Viewer responsive.

Default Columns

Column Description
file_id Stable file identifier, currently the repository path.
repo_id Hugging Face dataset repository id.
source_sha Source commit used to build the index.
dataset_id mgnify_proteins.
source_family Parsed source family such as mgy_clusters, mgy_proteins_1, or mgy_seq_metadata_2.
source_slug Source slug/path component used by the repository.
source_file Original MGnify source path when derivable.
path File path in the repository.
role File role: sequence_shard, table_jsonl, table_split_part, table_split_manifest, readme, or git_attributes.
shard_index FASTA shard index, otherwise -1.
part_index Split table part index, otherwise -1.
size_bytes File size in bytes.
compression File/container format.
logical_table_size_bytes Logical source table size when applicable, otherwise -1.
split_part_count Number of table split parts when applicable, otherwise -1.
split_chunk_bytes Target split chunk size when applicable, otherwise -1.
sequence_source_shard_count Number of shards in the sequence source, otherwise -1.
sequence_source_bytes Total bytes for that sequence source, otherwise -1.
repo_file_count Total repository files indexed.
repo_total_bytes Total indexed repository bytes.
sequence_shard_count_total Total sequence shards.
sequence_shard_bytes_total Total sequence shard bytes.
table_repo_file_count_total Total table files in the repo, including split manifests and parts.
table_repo_bytes_total Total table repo bytes, including duplicated top-level and split files where both exist.
logical_table_count_total Logical upstream table count.
logical_table_bytes_total Logical upstream table bytes.
is_sequence_shard Whether the row is a FASTA shard.
is_table_file Whether the row is a table file or table manifest.
is_split_part Whether the row is a split table part.
is_split_manifest Whether the row is a split manifest.
is_original_table_copy Whether a top-level table also has split parts.
download_pattern Glob or exact path for downloading related payload files.
access_note Short usage note.
split_bucket Deterministic bucket used for the default train/test split.

Files

  • data/*.parquet: default file/shard index for Dataset Viewer.
  • metadata/source_files.parquet: full index copy.
  • sequences/*/shard-*.fasta.zst: raw compressed MGnify FASTA shards.
  • tables/**/*.jsonl: raw normalized table payloads and split parts.
  • _MANIFEST.json: index build summary.
  • dataset_summary.json: same summary in a Dataset Viewer-adjacent file.
  • scripts/prepare_mgnify_dataset.py: script used to build the default index.

Source

Derived from LiteFold/Mgnify, originally sourced from EMBL-EBI MGnify.

License

CC BY 4.0.

Citation

If you use the MGnify data, cite MGnify:

Mitchell AL, et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Research, 48(D1):D570-D578, 2020.

Downloads last month
515