The League of Innovative Schools is a network of around 100 forward thinking district leaders from school districts across the U.S. The League network exists to support districts in the spread of innovative educational approaches and to provide learning opportunities for education professionals within and outside of the network.
A major platform for League sharing and learning is Twitter. The superintendents and educators that engage in League conversations on Twitter tag their posts with #DPLIS (stands for Digital Promise League of Innovative Schools). In addition to the normal activity on Twitter, there is a monthly Twitter Chat put on by the League where education stakeholders can pose and answer each other’s questions.
It is important to understand the reach of this Twitter activity. Does the Twitter activity and information sharing happen only between Digital Promise and district leaders? Or does it engage other education stakeholders? If the League network Twitter activity does engage other stakeholders, who are they?
To understand more about the stakeholders engaged with the League of Innovative Schools on Twitter, I went through the following steps:
1. Used Python (and the Tweepy library) to download data using the Twitter Search API
2. Cleaned the data and used R to generate summary statistics
3. Used R to do basic text analysis on the Twitter biography information
Gathering #DPLIS Tweets using the Twitter Search API
To pull data from Twitter, I generated an access token through the Twitter application interface. With this information, I was able to use the Tweepy library in Python to pull data on the DPLIS hashtag. The dataset I generated included information (user name, bio, number of followers, location, date and time stamp, and text) for each Tweet or Retweet that used #DPLIS. Because of limitations set by Twitter, the data included just the Tweets from the last 10 days - November 29, 2017-December 9, 2017.
Summary Statistics About the Twitter Dataset
After pulling the data from Twitter, I used R to further clean and analyze the data. The dataset included a total of 554 Tweets (with 263 of them being Retweets). I also broke down the Tweets into three groups: Tweets from the Digital Promise account (83), Tweets from Digital Promise employees (26), and Tweets by other stakeholders (445). This data is visualized below.
The map below marks the location of each user that engaged with the League of Innovative Schools on Twitter during this time period. The data comes from the location entered by users on their Twitter description:
When breaking down the Tweets by the number of Tweets per day, I find that a majority of the Tweets occurred on Thursday November 30, 2017. On this day, the monthly #DPLIS Twitter Chat occurred and it accounted for 84% of the Tweets during the ten day period.
Text Analysis of Twitter Bios
For this part of the analysis, I only looked at stakeholders that were not from Digital Promise. After removing the Tweets from the Digital Promise account and employee accounts, I aggregated the data to one line per user. This resulted in a list of 109 unique stakeholders that engaged with the League network on Twitter (and 98 of them had account descriptions/bios).
Using this data (of users and descriptions/bios), I performed a simple text analysis to see what the most frequently occurring words were in the users’ descriptions. To do this step, I created a list of words used in the descriptions. Then I cleaned the list by making all words lowercase, removing numbers, removing punctuation, removing spaces, and getting rid of stop words (e.g., as, to, the, etc).
The top 10 most frequently occurring words from the user descriptions are summarized in the table below:
Using the full word list, I developed a list of professions/stakeholders and created a variable to tag users that had these 'profession' words in their Twitter descriptions (e.g., superintendent, educator, leader, principal, coordinator, director etc). After creating these variables, I spot-checked the descriptions and their tags and generated a final breakdown of the stakeholders engaged with the League of Innovative Schools on Twitter.
If I were to continue to develop my Twitter text analysis model, my next step would be to identify words that come before and after the ‘profession’ word and tag groups of words that may change the way a stakeholder is classified (e.g. teacher vs former teacher). After that, if the dataset was large enough, I would develop and train a model to identify stakeholders and utilize training and testing data to test the model.