freeconnection: Conversational datasets to train a chatbot

Saturday, April 9, 2016

Conversational datasets to train a chatbot

As in the last two months I read a lot about chatbots which awakens in me the desire to develop my own chatbot. And of course the most trendy approach is some deep learning. That's why as a first step a decided to collect the available conversation datasets which are definitely needed for training. Here is the list of English conversation datasets I found: (If you know about more please leave a comment.)

Data collected from twitter (by Chenhao Tan):

Argument trees, "successful persuasion" metadata, and related data from the subreddit ChangeMyView. First release 2016.

Multi-community engagement (users posting, or not posting, in different subreddits since Reddit's inception). Data includes the texts of posts made and associated metadata, such as the subreddit, the "number" of upvotes, and the time stamp. First release 2015.

Cornell natural-experiment tweet pairs: data for investigating whether whether phrasing affects message propagation, controlling for user and topic. zip file can be retrieved from the given URL (first release 2014)

Supreme Court dialogs corpus: conversations and metadata (such as vote outcomes) from oral arguments before the US Supreme Court (first release 2012)

Wikipedia editor conversations corpus: zip file can be retrieved from the page I've linked to (first release 2012)

Cornell movie-dialogs corpus: conversations and metadata (IMDB rating, genre, character gender, etc.) from movie scripts (first release 2011). This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts: 220,579 conversational exchanges between 10,292 pairs of movie characters.

Microsoft Research Social Media Conversation Corpus. A collection of 12,696 Tweet Ids representing 4,232 three-step conversational snippets extracted from Twitter logs. Each row in the dataset represents a single context-message-response triple that has been evaluated by crowdsourced annotators as scoring an average of 4 or higher on a 5-point Likert scale measuring quality of the response in the context.

And a conversation on Reddit about a Reddit corpus.

The Santa Barbara corpus is an interesting one because it's a transcription of spoken dialogues.

The NPS Chat Corpus is part of the Python NLTK. Release 1.0 consists of 10,567 posts out of approximately 500,000 posts we have gathered from various online chat services in accordance with their terms of service. Future releases will contain more posts from more domains.

NUS Corpus is a collection of SMS messages. There is English and Chines corpus as well.

Off: during my research for conversation datasets I found a relatively large collection of public datasets here .

EDIT: you can also check the collection of QA datasets.
ALSO CHECK OUT THIS more comprehensive list of dialogue datasets.

28 comments:

Vishal said...: This is so helpful ! Thanks.... I owe you at least a beer or a coffee !; June 25, 2016 at 3:54 PM
Unknown said...: The blog was very informative, I am really crazy about chatbots. I really appreciate your work.; December 11, 2017 at 7:58 AM
Unknown said...: Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging
Best Machine Learning Training courses | best machine learning institute in chennai | Machine Learning course in chennai; September 20, 2018 at 12:15 PM
UMS Tech Labs said...: Amazing Article Written. I am very much glad to read your article.
I am Following Your From Last 6 Month and really linking the stuff
you post on your blog on Regular Basis.
Keep Posting blogs like this….. Thanks alot

also we provide WhatsApp API Integration Services. if any thing you need then please contact us; September 28, 2018 at 11:40 AM
Augurs Technologies Pvt Ltd. said...: We are specialize in ChatBot development services. If you're looking to build your bot on any of the popular chat applications - we should have a talk!; December 8, 2018 at 8:40 AM
wautomate said...: really i like your blog we also provide same blog Integrate WhatsApp with Tally; January 31, 2019 at 2:55 PM
Erick rowan said...: chatbot for marketing is the upcoming great feature in the field of marketing.; February 18, 2019 at 10:46 AM
Desiber said...: Incredible blog... Keep sharing.. Thanks alot!!!
Creative Graphic Design; April 22, 2019 at 11:10 AM
unknown said...: Hiii....Thank you so much for sharing Great information...Nice post...Keep move on...
Best Python Training Institutes in Hyderabad; July 16, 2019 at 12:59 PM
educational blogs said...: Thanks for sharing this valuable information and we collected some information from this blog.

Machine learning in-house Corporate training in Nigeria; September 28, 2019 at 11:13 AM
easylearn said...: Hi,
Best article, very useful and well explanation. Your post is extremely incredible.Good job & thank you very much for the new information, i learned something new. Very well written. It was sooo good to read and usefull to improve knowledge. Who want to learn this information most helpful. One who wanted to learn this technology IT employees will always suggest you take Data science course in Pimple Saudagar; October 12, 2019 at 7:15 AM
unknown said...: Hiii...Thanks for sharing Great info...Nice post Keep move on...
Python Training in Hyderabad; October 16, 2019 at 2:57 PM
Anonymous said...: Best information share
thank you
Logo Design; December 21, 2019 at 9:33 AM
Monica MS said...: They’re really convincing and will definitely work. Still, the posts are too brief for newbies. May you please extend them a little from subsequent time?Also, I’ve shared your website in my social networks.
Chatbot Company in Dubai
Chatbot Companies in Dubai
Chatbot Development
AI Chatbot Development
Chatbot Companies in UAE
Chatbot Company in Chennai
Chatbot Company in Mumbai
Chatbot Company in Delhi
Chatbot Development Companies; April 17, 2020 at 3:25 PM
KITS Technologies said...: Thanks for sharing this very good write-up. Very interesting ideas! (as always, btw)
Django online training
Django training
Go Language online training
Go Language training
Hibernate online training
Hibernate training
Hyperion ESS Base online training
Hyperion ESS Base training
Hyperion Fdqm online training; August 29, 2020 at 10:52 AM
India Lockdown said...: This content of information has helped me a lot. It is very well explained and easy to understand.
AI Corporate Training
https://www.analyticspath.com/artificial-intelligence-corporate-training; October 8, 2020 at 11:50 AM
Buy Seo Service said...: The content of this website was really informative. 50 High Quality Backlinks for just 50 INR
2000 Backlink at cheapest
5000 Backlink at cheapest
Boost DA upto 15+ at cheapest
Boost DA upto 25+ at cheapest
Boost DA upto 35+ at cheapest
Boost DA upto 45+ at cheapest; March 20, 2021 at 2:30 PM
Himachali Khabarnama said...: Annabelle loves to write and has been doing so for many years.Backlink Indexer My GPL Store Teckum-All about Knowledge; May 10, 2021 at 5:39 PM
Flexbox Digital said...: This comment has been removed by the author.; July 26, 2021 at 2:53 PM
Flexbox Digital said...: Great Post! Thanks for sharing informative article.
also we provide Web Design & Development Services in Melbourne. if any thing you need then please contact us.; July 26, 2021 at 2:57 PM
Tec said...: KGF 2 Release Date Directed by Prashanth Neel. With Yash, Sanjay Dutt, Raveena Tandon, Prakash Raj. The blood-soaked land of Kolar Gold Fields; September 3, 2021 at 11:17 AM
Shabana Sheikh said...: Hello, this weekend is good for me, since this time i am reading this enormous informative article here at my home. Chatbot for beginners; November 4, 2021 at 3:07 PM
Daulat Hussain said...: A woman excess weight around her breast and this is what she used to remove the fat around the breast; November 24, 2021 at 8:48 AM
Himachal News Network said...: Welcome to CapturedCurrentNews – Latest & Breaking India News 2021
Hello Friends My Name Anthony Morris.latest and breaking news linkfeeder; November 25, 2021 at 7:07 AM
Prince said...: chatbot for beginners You made such an interesting piece to read, giving every subject enlightenment for us to gain knowledge. Thanks for sharing the such information with us to read this...; December 6, 2021 at 9:49 AM
Syco said...: Thanks for the blog loaded with so many information. Stopping by your blog helped me to get what I was looking for. sales qualification; January 8, 2022 at 2:16 PM
Syco said...: I'm glad to see the great detail here!. organization professional; February 21, 2022 at 9:20 AM
Anonymous said...: Nice Post!
ai chatbot services
IT smart workforce services; May 23, 2022 at 2:15 PM

Subscribe to: Post Comments (Atom)