Large molecule datasetWhat is the dataset with the largest number of molecules?How to choose the level of...
Can a human variant take proficiency in initiative?
Can authors email you PDFs of their textbook for free?
What is the motivation behind designing a control stick that does not move?
Why haven't the British protested Brexit as ardently as the Hong Kong protesters?
What are ways to record who took the pictures if a camera is used by multiple people?
Is there anything in the universe that cannot be compressed?
German equivalent to "going down the rabbit hole"
I was given someone else's visa, stamped in my passport
In Mathematics, what is the standing of the journal Proc. AMS?
What is the practical impact of using System.Random which is not cryptographically random?
What kind of electrical outlet is this? Red, winking-face shape
Must a leaky tire plug be redone completely?
Why don't "echo -e" commands seem to produce the right output?
Could a complex system of reaction wheels be used to propel a spacecraft?
Confidence intervals for the mean of a sample of counts
Can UV radiation be safe for the skin?
Where should I draw the line on follow up questions from previous employer
How can I store milk for long periods of time?
Does using composite keys violate 2NF
Divide Numbers by 0
Can the inductive kick be discharged without a freewheeling diode, in this example?
A word for the urge to do the opposite
When you have to wait for a short time
Does Q ever actually lie?
Large molecule dataset
What is the dataset with the largest number of molecules?How to choose the level of theory for modelling reactions of polymers?The “rules” for LCAOs in Molecular Orbital TheoryQuick-and-Dirty Molecular Dynamics by Mass-Weighted Atom Translations?Tunneling corrections in reaction ratesGaussian scan function help for constructing input file(Computationally) finding similarity between two organic compoundsGaussian calculation problem - Maxcycle values for Opt and SCFGaussian parameters for IP and EA calculationDifference between Force Field and topology, and other related questionsRadial pair distribution function (VMD). How to define it for a water NaCl system with multiple Na and Cl?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
$begingroup$
I have been testing a machine learning approach for molecular energy prediction. The current dataset that I have is QM9, which is consist of molecules with up to 9 heavy atoms. I was wondering if anyone know of the largest molecule datasets available. I will be testing ZINC, which has up to 38 atoms. Anyone knows of a larger dataset available?? Thanks!
quantum-chemistry computational-chemistry databases
New contributor
$endgroup$
add a comment |
$begingroup$
I have been testing a machine learning approach for molecular energy prediction. The current dataset that I have is QM9, which is consist of molecules with up to 9 heavy atoms. I was wondering if anyone know of the largest molecule datasets available. I will be testing ZINC, which has up to 38 atoms. Anyone knows of a larger dataset available?? Thanks!
quantum-chemistry computational-chemistry databases
New contributor
$endgroup$
$begingroup$
In retrospect -- after giving providing an answer (molecules encoded as SMILES), while «user1271772» provided an other answer about molecules keeping geometries, does your question for a larger data set refer to a dataset containing molecules larger than 38 atoms, or a dataset with more molecules than ZINC?
$endgroup$
– Buttonwood
5 hours ago
$begingroup$
Larger than 38 atoms. I was gonna say, but I appreciated your detail answer very much
$endgroup$
– Blade
5 hours ago
add a comment |
$begingroup$
I have been testing a machine learning approach for molecular energy prediction. The current dataset that I have is QM9, which is consist of molecules with up to 9 heavy atoms. I was wondering if anyone know of the largest molecule datasets available. I will be testing ZINC, which has up to 38 atoms. Anyone knows of a larger dataset available?? Thanks!
quantum-chemistry computational-chemistry databases
New contributor
$endgroup$
I have been testing a machine learning approach for molecular energy prediction. The current dataset that I have is QM9, which is consist of molecules with up to 9 heavy atoms. I was wondering if anyone know of the largest molecule datasets available. I will be testing ZINC, which has up to 38 atoms. Anyone knows of a larger dataset available?? Thanks!
quantum-chemistry computational-chemistry databases
quantum-chemistry computational-chemistry databases
New contributor
New contributor
New contributor
asked 9 hours ago
BladeBlade
62 bronze badges
62 bronze badges
New contributor
New contributor
$begingroup$
In retrospect -- after giving providing an answer (molecules encoded as SMILES), while «user1271772» provided an other answer about molecules keeping geometries, does your question for a larger data set refer to a dataset containing molecules larger than 38 atoms, or a dataset with more molecules than ZINC?
$endgroup$
– Buttonwood
5 hours ago
$begingroup$
Larger than 38 atoms. I was gonna say, but I appreciated your detail answer very much
$endgroup$
– Blade
5 hours ago
add a comment |
$begingroup$
In retrospect -- after giving providing an answer (molecules encoded as SMILES), while «user1271772» provided an other answer about molecules keeping geometries, does your question for a larger data set refer to a dataset containing molecules larger than 38 atoms, or a dataset with more molecules than ZINC?
$endgroup$
– Buttonwood
5 hours ago
$begingroup$
Larger than 38 atoms. I was gonna say, but I appreciated your detail answer very much
$endgroup$
– Blade
5 hours ago
$begingroup$
In retrospect -- after giving providing an answer (molecules encoded as SMILES), while «user1271772» provided an other answer about molecules keeping geometries, does your question for a larger data set refer to a dataset containing molecules larger than 38 atoms, or a dataset with more molecules than ZINC?
$endgroup$
– Buttonwood
5 hours ago
$begingroup$
In retrospect -- after giving providing an answer (molecules encoded as SMILES), while «user1271772» provided an other answer about molecules keeping geometries, does your question for a larger data set refer to a dataset containing molecules larger than 38 atoms, or a dataset with more molecules than ZINC?
$endgroup$
– Buttonwood
5 hours ago
$begingroup$
Larger than 38 atoms. I was gonna say, but I appreciated your detail answer very much
$endgroup$
– Blade
5 hours ago
$begingroup$
Larger than 38 atoms. I was gonna say, but I appreciated your detail answer very much
$endgroup$
– Blade
5 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
The ISOL24 database (http://www.thch.uni-bonn.de/tc.old/downloads/GMTKN/GMTKN55/ISOL24.html) contains molecules with up to 81 atoms!
$endgroup$
add a comment |
$begingroup$
This sounds like you were exploring work at least related to the work by the Lilienfeld group equally hosting a dedicated site here about data sets already used in their earlier and ongoing exploration of chemical space, programs used to work with the data, and publications.
To go considerably higher in molecule count than QM9, you could either go for
GDB-11 about small organic molecules up to 11 atoms of C, N, O and F which «contains 26.4 million molecules (110.9 million stereoisomers), including three- and four-membered rings and triple bonds», described in J. Chem. Inf. Model. 2007, 47, 342-353 (doi.org/10.1021/ci600423u), or
GDB-13, about «small organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules. With 977 468 314 structures, GDB-13 is the largest publicly available small organic molecule database to date». This one was described in J. Am. Chem. Soc. 2009, 131, 8732-8733 (doi.org/10.1021/ja902302h)
Convienently, you can download both -- including sub-sets like «containing only carbon and nitrogen», or «chlorine and sulfur», or «fragrance like» in case you don't want to fetch 2GB of already compressed data -- from the Reymond group. To quote: «All the molecules are stored in dearomatized, canonized SMILES format.»
The even larger GDB-17 («of up to 17 atoms of C, N, O, S, and halogens» with an universe of 166 billion entries, described in J. Chem. Inf. Model. 2012, 52, 2864-2875, [doi.org/10.1021/ci300415d, open access]) is accessible to the public on this site as a 50 million random subset only, partly because the gzipped archive is about 400GByte. Among the publications citing this work is for example the Lilienfeld group again for machine learning (J. Chem. Phys. 143, 084111 (2015), doi.org/10.1063/1.4928757).
$endgroup$
$begingroup$
I don't think any of these have more than 81 atoms!
$endgroup$
– user1271772
4 hours ago
$begingroup$
Ok I see the confusion. @Buttonwood perhaps you can answer this question: chemistry.stackexchange.com/questions/119804/…
$endgroup$
– user1271772
4 hours ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "431"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Blade is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fchemistry.stackexchange.com%2fquestions%2f119797%2flarge-molecule-dataset%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The ISOL24 database (http://www.thch.uni-bonn.de/tc.old/downloads/GMTKN/GMTKN55/ISOL24.html) contains molecules with up to 81 atoms!
$endgroup$
add a comment |
$begingroup$
The ISOL24 database (http://www.thch.uni-bonn.de/tc.old/downloads/GMTKN/GMTKN55/ISOL24.html) contains molecules with up to 81 atoms!
$endgroup$
add a comment |
$begingroup$
The ISOL24 database (http://www.thch.uni-bonn.de/tc.old/downloads/GMTKN/GMTKN55/ISOL24.html) contains molecules with up to 81 atoms!
$endgroup$
The ISOL24 database (http://www.thch.uni-bonn.de/tc.old/downloads/GMTKN/GMTKN55/ISOL24.html) contains molecules with up to 81 atoms!
edited 1 hour ago
answered 6 hours ago
user1271772user1271772
5884 silver badges13 bronze badges
5884 silver badges13 bronze badges
add a comment |
add a comment |
$begingroup$
This sounds like you were exploring work at least related to the work by the Lilienfeld group equally hosting a dedicated site here about data sets already used in their earlier and ongoing exploration of chemical space, programs used to work with the data, and publications.
To go considerably higher in molecule count than QM9, you could either go for
GDB-11 about small organic molecules up to 11 atoms of C, N, O and F which «contains 26.4 million molecules (110.9 million stereoisomers), including three- and four-membered rings and triple bonds», described in J. Chem. Inf. Model. 2007, 47, 342-353 (doi.org/10.1021/ci600423u), or
GDB-13, about «small organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules. With 977 468 314 structures, GDB-13 is the largest publicly available small organic molecule database to date». This one was described in J. Am. Chem. Soc. 2009, 131, 8732-8733 (doi.org/10.1021/ja902302h)
Convienently, you can download both -- including sub-sets like «containing only carbon and nitrogen», or «chlorine and sulfur», or «fragrance like» in case you don't want to fetch 2GB of already compressed data -- from the Reymond group. To quote: «All the molecules are stored in dearomatized, canonized SMILES format.»
The even larger GDB-17 («of up to 17 atoms of C, N, O, S, and halogens» with an universe of 166 billion entries, described in J. Chem. Inf. Model. 2012, 52, 2864-2875, [doi.org/10.1021/ci300415d, open access]) is accessible to the public on this site as a 50 million random subset only, partly because the gzipped archive is about 400GByte. Among the publications citing this work is for example the Lilienfeld group again for machine learning (J. Chem. Phys. 143, 084111 (2015), doi.org/10.1063/1.4928757).
$endgroup$
$begingroup$
I don't think any of these have more than 81 atoms!
$endgroup$
– user1271772
4 hours ago
$begingroup$
Ok I see the confusion. @Buttonwood perhaps you can answer this question: chemistry.stackexchange.com/questions/119804/…
$endgroup$
– user1271772
4 hours ago
add a comment |
$begingroup$
This sounds like you were exploring work at least related to the work by the Lilienfeld group equally hosting a dedicated site here about data sets already used in their earlier and ongoing exploration of chemical space, programs used to work with the data, and publications.
To go considerably higher in molecule count than QM9, you could either go for
GDB-11 about small organic molecules up to 11 atoms of C, N, O and F which «contains 26.4 million molecules (110.9 million stereoisomers), including three- and four-membered rings and triple bonds», described in J. Chem. Inf. Model. 2007, 47, 342-353 (doi.org/10.1021/ci600423u), or
GDB-13, about «small organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules. With 977 468 314 structures, GDB-13 is the largest publicly available small organic molecule database to date». This one was described in J. Am. Chem. Soc. 2009, 131, 8732-8733 (doi.org/10.1021/ja902302h)
Convienently, you can download both -- including sub-sets like «containing only carbon and nitrogen», or «chlorine and sulfur», or «fragrance like» in case you don't want to fetch 2GB of already compressed data -- from the Reymond group. To quote: «All the molecules are stored in dearomatized, canonized SMILES format.»
The even larger GDB-17 («of up to 17 atoms of C, N, O, S, and halogens» with an universe of 166 billion entries, described in J. Chem. Inf. Model. 2012, 52, 2864-2875, [doi.org/10.1021/ci300415d, open access]) is accessible to the public on this site as a 50 million random subset only, partly because the gzipped archive is about 400GByte. Among the publications citing this work is for example the Lilienfeld group again for machine learning (J. Chem. Phys. 143, 084111 (2015), doi.org/10.1063/1.4928757).
$endgroup$
$begingroup$
I don't think any of these have more than 81 atoms!
$endgroup$
– user1271772
4 hours ago
$begingroup$
Ok I see the confusion. @Buttonwood perhaps you can answer this question: chemistry.stackexchange.com/questions/119804/…
$endgroup$
– user1271772
4 hours ago
add a comment |
$begingroup$
This sounds like you were exploring work at least related to the work by the Lilienfeld group equally hosting a dedicated site here about data sets already used in their earlier and ongoing exploration of chemical space, programs used to work with the data, and publications.
To go considerably higher in molecule count than QM9, you could either go for
GDB-11 about small organic molecules up to 11 atoms of C, N, O and F which «contains 26.4 million molecules (110.9 million stereoisomers), including three- and four-membered rings and triple bonds», described in J. Chem. Inf. Model. 2007, 47, 342-353 (doi.org/10.1021/ci600423u), or
GDB-13, about «small organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules. With 977 468 314 structures, GDB-13 is the largest publicly available small organic molecule database to date». This one was described in J. Am. Chem. Soc. 2009, 131, 8732-8733 (doi.org/10.1021/ja902302h)
Convienently, you can download both -- including sub-sets like «containing only carbon and nitrogen», or «chlorine and sulfur», or «fragrance like» in case you don't want to fetch 2GB of already compressed data -- from the Reymond group. To quote: «All the molecules are stored in dearomatized, canonized SMILES format.»
The even larger GDB-17 («of up to 17 atoms of C, N, O, S, and halogens» with an universe of 166 billion entries, described in J. Chem. Inf. Model. 2012, 52, 2864-2875, [doi.org/10.1021/ci300415d, open access]) is accessible to the public on this site as a 50 million random subset only, partly because the gzipped archive is about 400GByte. Among the publications citing this work is for example the Lilienfeld group again for machine learning (J. Chem. Phys. 143, 084111 (2015), doi.org/10.1063/1.4928757).
$endgroup$
This sounds like you were exploring work at least related to the work by the Lilienfeld group equally hosting a dedicated site here about data sets already used in their earlier and ongoing exploration of chemical space, programs used to work with the data, and publications.
To go considerably higher in molecule count than QM9, you could either go for
GDB-11 about small organic molecules up to 11 atoms of C, N, O and F which «contains 26.4 million molecules (110.9 million stereoisomers), including three- and four-membered rings and triple bonds», described in J. Chem. Inf. Model. 2007, 47, 342-353 (doi.org/10.1021/ci600423u), or
GDB-13, about «small organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules. With 977 468 314 structures, GDB-13 is the largest publicly available small organic molecule database to date». This one was described in J. Am. Chem. Soc. 2009, 131, 8732-8733 (doi.org/10.1021/ja902302h)
Convienently, you can download both -- including sub-sets like «containing only carbon and nitrogen», or «chlorine and sulfur», or «fragrance like» in case you don't want to fetch 2GB of already compressed data -- from the Reymond group. To quote: «All the molecules are stored in dearomatized, canonized SMILES format.»
The even larger GDB-17 («of up to 17 atoms of C, N, O, S, and halogens» with an universe of 166 billion entries, described in J. Chem. Inf. Model. 2012, 52, 2864-2875, [doi.org/10.1021/ci300415d, open access]) is accessible to the public on this site as a 50 million random subset only, partly because the gzipped archive is about 400GByte. Among the publications citing this work is for example the Lilienfeld group again for machine learning (J. Chem. Phys. 143, 084111 (2015), doi.org/10.1063/1.4928757).
answered 5 hours ago
ButtonwoodButtonwood
10.9k1 gold badge22 silver badges47 bronze badges
10.9k1 gold badge22 silver badges47 bronze badges
$begingroup$
I don't think any of these have more than 81 atoms!
$endgroup$
– user1271772
4 hours ago
$begingroup$
Ok I see the confusion. @Buttonwood perhaps you can answer this question: chemistry.stackexchange.com/questions/119804/…
$endgroup$
– user1271772
4 hours ago
add a comment |
$begingroup$
I don't think any of these have more than 81 atoms!
$endgroup$
– user1271772
4 hours ago
$begingroup$
Ok I see the confusion. @Buttonwood perhaps you can answer this question: chemistry.stackexchange.com/questions/119804/…
$endgroup$
– user1271772
4 hours ago
$begingroup$
I don't think any of these have more than 81 atoms!
$endgroup$
– user1271772
4 hours ago
$begingroup$
I don't think any of these have more than 81 atoms!
$endgroup$
– user1271772
4 hours ago
$begingroup$
Ok I see the confusion. @Buttonwood perhaps you can answer this question: chemistry.stackexchange.com/questions/119804/…
$endgroup$
– user1271772
4 hours ago
$begingroup$
Ok I see the confusion. @Buttonwood perhaps you can answer this question: chemistry.stackexchange.com/questions/119804/…
$endgroup$
– user1271772
4 hours ago
add a comment |
Blade is a new contributor. Be nice, and check out our Code of Conduct.
Blade is a new contributor. Be nice, and check out our Code of Conduct.
Blade is a new contributor. Be nice, and check out our Code of Conduct.
Blade is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Chemistry Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fchemistry.stackexchange.com%2fquestions%2f119797%2flarge-molecule-dataset%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
In retrospect -- after giving providing an answer (molecules encoded as SMILES), while «user1271772» provided an other answer about molecules keeping geometries, does your question for a larger data set refer to a dataset containing molecules larger than 38 atoms, or a dataset with more molecules than ZINC?
$endgroup$
– Buttonwood
5 hours ago
$begingroup$
Larger than 38 atoms. I was gonna say, but I appreciated your detail answer very much
$endgroup$
– Blade
5 hours ago