Quantitative Asset Management

Datasets AI Alpha Signals

Datasets

The GameStop madness on Reddit beginning of 2021 has impressively shown the power of social media over financial markets to the world. Retail traders pushed this to a limit never seen before. The entire retail community is not only since the GameStop case actively present on social media discussing and expressing their opinions for the next best trade. Make use of this unique information: The emotional value of asset allocation.

Content-rich datasets are gathered from myriad sources such as Reddit, TikTok, Discord, Twitter and many more. Analyzed and interpreted by our Artificial Intelligence and Deep learning systems to provide you with highly accurate, ticker-mapped, clean, real-time, high volume, and velocity-rich data.

Our message collectors trace social media squawk in English, German, Mandarin, Swedish, and many more languages. We collect our data from several thousand different sources worldwide, with a specific focus on social media. This user-generated content comprises a substantial part of communication in social media networks today. We identify this emotionally expressed content and classify it as “Emotional Data.”

We cover more than 60 thousand equities from North America, Europe, Asia, and Australia. Our historical data spans more than 12 years which is one of the longest histories available in the field of sentiment analysis and Natural Language Processing (NLP). The data is very well structured and easy to integrate via API and standard formats such as JSON or CSV.

Our offering ranges from raw (crawled) messages to aggregated data on different time scales, ready to use for internal calculations to fully processed AI alpha signals.

60k+

Equities

13 years+

Historical Data

200M+

Diversified Sources

6B+

Historical Messages

Data Sets & White Papers

TikTok Data Monitoring

TikTok data matters. With 60% of its users belonging to Generation Z (Dean, 2023), TikTok ranks as the sixth most popular social media platform, and it leads the pack as the most engaging one (Dean, 2023). This report looks into TikTok data collected from comments and videos over the last 5 years to highlight its potential for financial markets, e.g., from an investment or regulatory perspective. The findings reveal trends, patterns, and highlight companies that excel on this short-form video plat- form. Our results demonstrate that TikTok comments encompass not only lifestyle consumer products but also extend to all pertinent economic sectors. When it comes to its use in trading we found that a simple backtesting model informed by TikTok data lead in 58% to a positive score for Jensen’s alpha.

Discord Data Monitoring

This report examines data collected from over 30k Discord public servers about Meme Stocks and Crypto Currencies. The data analyzed covers the time period from January 1st, 2018 to November 21th, 2022. Does this data has the potential to trigger movements on the financial markets? To answer this question, the report focuses on Granger causality analysis and estimates associated effect sizes (partial eta squared, ηp2). Among the main findings are that discussions on Discord have stronger effects on stocks than on crypto assets and that at least a small positive correlation between positive sentiments (pos) and close rate, was necessary but not sufficient for a significant effect of Granger causality.

Key benefits

Comprehensive API functionality with dozens of different views on the data

Standard outputs: CSV and JSON

Coverage of most important sources such as Reddit, Discord, and Telegram

Support of different programming languages: Python, Java, PHP

Download raw data for internal usage

Aggregated data on different time scales (10 minutes, hourly, 24 hours)

Interested in different views on the data to optimize your internal trading model?