Personal Archives - Mathew Kiang (.com)

My collaboration network: 2010 to 2022 version

MV Kiang — Fri, 20 Jan 2023 20:36:38 +0000

There was a lot going on last year and I missed my annual tradition of updating my collaboration network at the end of the year. However, thanks to a perpetual illness ravaging the house, I’ve found some sleepless hours to fix my old code and pick up the tradition again.

As is always the case when I do this exercise, I can’t help but reflect back on 2022 (and 2021) with gratitude and appreciation for my amazing collaborators. It was my first year on the tenure-track and hours that normally would have been spent pushing research forward were instead spent getting my feet under me, taking on new students, hiring new postdocs, building up some community partnerships, shuffling through administrative tasks, etc. There’s a lot of things I enjoy about the job but by far the most enjoyable thing is working on things I think are interesting and important with people I like.

Below is a plot of collaborations (circles) over time (x-axis) by collaborator (y-axis). The grey ones are collaborators I’ve never actually met in person while the black ones are collaborators I’ve met in person. I had assumed COVID would have impacted collaborations such that there would be many more collaborators I’ve never met in person but that doesn’t seem to be the case.

Here is a set plot of my top ten collaborators (in terms of numbers of collaborations) and their different sets. Surprisingly, this list has changed pretty substantially since 2020 with a core group of UCSF researchers (Kirsten, Maria, and Yea-Hung) shooting up the list.

Conditional on having more than one collaboration, who are my “most efficient” collaborators in terms of average number of citations per project?

And lastly, do COVID-19-related papers get more citations than non-COVID papers? Below I plot all papers published since 2020 along with their annual citations. The ribbons and bold lines represent the linear fit. As you can see, COVID papers have both a higher intercept (they get more immediate citations) and a slightly higher slope (they are cited more over time) that is probably not statistically significant.

The post My collaboration network: 2010 to 2022 version appeared first on Mathew Kiang (.com).

Plots of my biking in 2022

MV Kiang — Sun, 15 Jan 2023 00:04:55 +0000

One of the best things about living in California is having amazing weather nearly all year long (the current 3-week-stretch-of-non-stop-rain aside). So last year, I decided to capitalize on the weather and made a New Year’s resolution to bike outdoors more. Specifically, I wanted to bike 1,500 miles outdoors in addition to my normal indoor biking of 2,500 miles. (Also, with a side quest of 100,000 feet of cumulative elevation gain.)

Below is a plot of my cumulative distance (and elevation) over the course of the year. I just barely got the distance resolutions with 1,551.7 miles outdoors and 2,506.1 indoors. I missed the elevation resolution by about 16,000 feet (ending at 83,899 feet).

Red represents Peloton rides while blue represents outdoor rides. The vertical shaded areas represent periods where I was away from home or unable to bike. In September, I got COVID, which kept me off the bike for weeks and way behind schedule on my miles. Even when I did manage to get back on the bike, I was unable to do particularly long or intense rides for a few more weeks. Brutal.

Anyways, as a San Diego native, I never really understood the hype around San Francisco — it just seemed crowded with dirty beaches. It’s still crowded with dirty beaches but biking has given me a much greater appreciation for the city. It’s surreal to ride between skyscrapers, then along the Bay overlooking Alcatraz, then through the the giant pine trees of the Presidio, over the Golden Gate Bridge, and along the cliffs overlooking the Pacific — all in a single ride. Below, is a sample of 36 rides around San Francisco.

A really nice aspect of riding in San Francisco is that you don’t really need a plan. There are enough routes that are connected to things worth checking out that you can just wing it as you ride and change your route depending on how your legs feel that day. This made me curious though. Which areas provide the most options? What areas do I tend to always string together? How many more miles (or feet of climbing) does going down a different branch add? To get at some of these questions, I took different parts of San Francisco and outline them on a map (left) as well as arbitrarily relocate them on a network representation (right). Note that the color gradient is roughly by latitude, but this doesn’t translate to the network representation.

Below, we can then take a subset of rides that start and end in San Francisco (left) and plot them as a network (right) where the nodes still represent geographic areas, the size and transparency of the node represents the number of visits (in-degree), and the edges represent my biking transitions from one area to another (darker means more of those transitions).

Here is a popular route called a Butterlap. It’s beautiful and my go-to route when showing out-of-towners around. It lets them see the main San Francisco spots (the Bay Bridge, the Ferry Building, Fisherman’s Wharf, Crissy Field, Alcatraz, Golden Gate Bridge, the Presidio, Land’s End, the Legion of Honor, Ocean Beach, Golden Gate Park, and downtown SF) and is reasonably flat.

If, in the middle of the ride, you decide you have another 10 miles in you, you can quickly convert this to (what I’ve called) the Butter Lake, which involves a loop around Lake Merced to the south of the city.

My favorite route in San Francisco involves crossing the Golden Gate Bridge and going up Hawk Hill. Hawk Hill is *the* classic SF climb and if you go in the morning on weekdays, it’s not uncommon to see pros of team training. It’s a great climb with stunning views and a fun, fast descent. In our two representations, the route looks like this.

But there are many days when I get to the bottom of Hawk Hill and my legs decide they just don’t have any climbing in them. So instead, you can add 50 miles and do a Paradise + China Camp loop.

Networks are a useful way of identifying and visualizing these types of decision points. Some more network-based bike metrics to come once I gather more data — any excuse for a few more rides.

The post Plots of my biking in 2022 appeared first on Mathew Kiang (.com).

It finally happened — I got COVID

MV Kiang — Wed, 07 Dec 2022 23:02:20 +0000

Last September, I got COVID. It was wildly unpleasant with serious brain fog that lasted for several weeks even after the other symptoms went away. That said, this did give me the opportunity to make some more plots based on my own data. Below, I show a few metrics of my vital signs (respiratory rate, heart rate, heart rate variability, and body temperature deviation) relative to my exposure (vertical dotted line) for six weeks before and after. The thicker grey lines in the background are the pre- and post-exposure averages for those six weeks.

As you can see, for a few things, even six weeks after exposure, I did not return to my pre-exposure baseline. My respiratory rate was slightly lower, my average heart rate was (and actually still remains) slightly elevated, and my heart rate variability is still lower (higher is better). My temperature is more or less the same.

All of this resulted in decreased physical activity, which I plot below.

I eventually went back to my baseline level of physical activity for all different metrics, but as you can see in the MET minutes metrics, there was a fairly long period of inactivity where it felt like my heart was not ready for intense exercise.

So, COVID-19: 1/10 — would not recommend.

The post It finally happened — I got COVID appeared first on Mathew Kiang (.com).

It’s official — I’m a tenure-track assistant professor

MV Kiang — Mon, 24 Jan 2022 22:05:02 +0000

The post It’s official — I’m a tenure-track assistant professor appeared first on Mathew Kiang (.com).

My collaboration network for 2010 to 2020 (+ other plots)

MV Kiang — Thu, 10 Dec 2020 22:36:31 +0000

In what has become a bit of an annual tradition, here is my collaboration network for 2010 to 2020. This year was rough. Of the two first-author papers published this year, one was pre-pandemic. I think it’s fair to say this wasn’t the level of productivity I was expecting of myself. Hopefully, a few projects still in the pipeline will come out early next year.

All that said, I’m thankful for a strong network of kind collaborators who picked up my slack when necessary, checked in on me even when we didn’t have an active project, and understood when childcare issues caused last minute Zoom cancellations.

You’ll have plenty of time to work with famous, smart, and/or fun people — 2020 was a good reminder of the importance of working with kind people.

The first time I made this plot, I noted how many components I had and how disjointed the collaboration networks were. Since then, there’s now (1) a dominant connected component, (2) my NYU component (top middle) that will likely always be disconnected, (3) my Health Policy and Management cluster (top left), which has a reasonable chance of connecting with the rest of the group now that Sara is at Stanford, and (4) a single paper with Alex (middle right), which will almost certainly join the rest of the group at some point. It’s also interesting to note the trajectories of papers (in terms of citations) in the lower left. A couple papers seem to get some traction, but for the most part, my papers tend to add citations at around 5-10 cites per year.

Below is a plot of collaborations (circles) over time (x-axis) by collaborator (y-axis). I’ve worked fairly consistently with two people, Nancy and Jarvis, for six years, which is pretty wild. Most of my collaborations are bursty with a rush of papers and then long dormant periods but a handful are pretty regular with ~1 paper per year. Most of my collaborators are one-time collaborators.

Another thing we can look at is which of my collaborators also collaborate together (conditional on me being on the paper). Below, I show the top ten (in terms of number of collaborations) collaborators with a horizontal bar chart for the number of times we have worked together. The lower right plot shows dots and lines of intersecting collaborators along with how often this subset of collaborators appears in my collaborations (vertical bars).

For example, there are six papers with Jason, Pam, Nancy, and Jarvis and additional two with the same group minus Jason. Sara is the outlier here with a 5 collaborations — none of which involve another top ten collaborator.

Lastly, for kicks I wanted to see who my “most efficient” collaborator is. That is, conditional on more than one project together, who has the highest average number of citations per project?

The answer (two yellow dots in upper left) is Nishant and Rafa at about 150 citations per project. (The upper right is Jarvis followed by Nancy.)

Code is here. Note there are five files and you need to change my_id on line 21 of 01_pull_data.R to your Google Scholar ID.

The post My collaboration network for 2010 to 2020 (+ other plots) appeared first on Mathew Kiang (.com).

Applying an intro-level networks concept to deleting tweets

MV Kiang — Fri, 16 Oct 2020 23:34:12 +0000

There are a few services out there that will delete your old tweets for you, but I wanted to delete tweets with a bit more control. For example, there are some tweets I need to keep up for whatever reason (e.g., I need it for verification) or a few jokes I’m proud of and don’t want to delete.

If you just want the R code to delete some tweets based on age and likes, here it is (noting that it is based on Chris Albon’s Python script). In this post, I go over a bit of code about what I thought was an interesting problem: given a list of tweets, how can we identify and group threads?

Below, I plot all my tweets over time (x-axis) by the number of “likes” (y-axis) and I highlight in red tweets that are threaded together. Ignore the boxes for now.

Pulling the data using rtweet, you end up with a dataframe that looks something like this (only with many many more rows and columns):

> before_df %>% select(status_id, created_at, screen_name, text, reply_to_status_id)
# A tibble: 510 x 5
   status_id    created_at          screen_name text                           reply_to_status…
                                                                     
 1 13167994911… 2020-10-15 17:53:58 mathewkiang "@JosephPalamar Haha — the on… 131679883199713…
 2 13167817880… 2020-10-15 16:43:37 mathewkiang "@khayeswilson Ah \"self-liki… 131678029839121…
 3 13167812579… 2020-10-15 16:41:31 mathewkiang "@khayeswilson AH! This is pe… 131678029839121…
 4 13167755278… 2020-10-15 16:18:44 mathewkiang "I've been coding up a script… NA              
 5 13165350914… 2020-10-15 00:23:20 mathewkiang "https://t.co/7PtlUKWTeU http… NA              
 6 13161332751… 2020-10-13 21:46:40 mathewkiang "Data: Full-time academic job… NA              
 7 13144052234… 2020-10-09 03:20:00 mathewkiang "@simonw Thanks for the info!… 131439639207741…
 8 13143912914… 2020-10-09 02:24:38 mathewkiang "@simonw Does this include da… 131439055526721…
 9 13142896495… 2020-10-08 19:40:45 mathewkiang "Me: This paper has been out … NA              
10 13136475049… 2020-10-07 01:09:06 mathewkiang "@Doc_Courtney If by “passing… 131364337679803…
# … with 500 more rows

To select the tweets you want to delete, it is straight-forward to make a rule like: (1) delete all tweets created more than two years ago with fewer than 100 likes (left-most grey box in the plot) or (2) delete all tweets created more than 90 days ago with fewer than 25 likes (bottom grey box in the plot). You could even create a function where the number of likes must be exponentially higher over time. And obviously, you can create a list of tweets (status_ids) that you want to keep.

However, this assumes all tweets are independent. Things get a bit more complicated if you want to treat sets of tweets with the score of any single tweet in the set. If, for example, you string together a twitter thread, you may want to delete or save the entire thread based only on the first tweet since deleting the “unliked” tweets will break up the thread. Twitter doesn’t provide a column that links threads together through a unified ID.

After chatting with Malcolm Barrett about it for a bit, I realized this is a fairly simple network problem. If you imagine the data frame above, where every row is a tweet, as an edge list between vertex status_id and reply_to_status_id, then you can remove all isolates to get trees of threads (most would just be chains). The key code is here but to sketch out the broad points:

Take before_df and (a) filter out isolates (e.g., non-threaded tweets) by making sure each tweet is referred to by another tweet or refers to another tweet within the data frame and (b) removing comments to other people by removing tweets that start with “@”. Because it’s an edge list, we will rename the columns to “from” and “to” and if there is no terminating vertex (i.e., it’s the first tweet in a thread), we will create a self-loop.
```
before_df %>%
        filter(status_id %in% reply_to_status_id | 
                   reply_to_status_id %in% status_id,
               substr(text, 1, 1) != "@") %>% 
        select(from = status_id, to = reply_to_status_id) %>%
        mutate(to = ifelse(is.na(to), from, to))
```

Now just convert this edge list into a graph and extract all the components using igraph

thread_assignments <- thread_df %>%
        graph_from_data_frame(directed = TRUE) %>%
        components()

Now you have a mapping of every threaded tweet ID to a component ID. Below, I just take this mapping and then create a new component ID that is the same as the starting tweet of the thread.

id_mapping <- thread_df %>%
        select(status_id = from) %>%
        left_join(tibble(
            status_id = names(thread_assignments$membership),
            membership = thread_assignments$membership
        )) %>%
        group_by(membership) %>%
        mutate(new_status_id = min(status_id)) %>%
        ungroup()

That’s it! With this mapping, you can left_join() the original data frame and perform manipulations on the thread as a group of tweets rather than each tweet individually. Anyways, check out the gist to see how I deleted the tweets and implemented this. I just thought it was a nice, clean application of an introductory-level network concept to an applied data cleaning problem.

After deleting old and boring tweets and keeping tweets I liked (taking into account groups), I’m left with the black points above. The grey points were tweets that I ended up deleting.

(Disclaimer: There’s almost certainly a better way to do this — I just don’t know it.)

The post Applying an intro-level networks concept to deleting tweets appeared first on Mathew Kiang (.com).

I wrote a simulation paper about playing Candy Land with toddlers

MV Kiang — Tue, 08 Sep 2020 01:38:40 +0000

The post I wrote a simulation paper about playing Candy Land with toddlers appeared first on Mathew Kiang (.com).

Our new paper about opioid prescribing patterns in the US

MV Kiang — Tue, 04 Feb 2020 17:24:56 +0000

Some notes about a new (open access) paper with Keith Humphreys, Mark Cullen, and Sanjay Basu — “Opioid prescribing patterns among medical providers in the United States, 2003-17: retrospective, observational study” — just published in BMJ.

Reproducible code is available on the official paper Github repository. I don’t plan to make any changes, but just in case, the first release is the version of the code used to produce the manuscript.
We’ve created a handful of Shiny apps for interested readers. What we present in the paper is only a tiny sliver of all the results — additional results by drug, sensitivity analyses, outcome, and geography are available through the apps.
Despite our best efforts, we were not able to share the aggregated (proprietary) data. However, we’ve created a mechanism for other researchers to access the data through Stanford’s Center for Population Health Science’s Data Core. (Approval from an Optum representative is still required — see the Github repo for details.)
In addition, we’ve created a mechanism for researchers to get access to the raw data for a fee to reproduce our entire analysis from start to finish.
As with most BMJ papers, the peer review reports are open and publicly accessible.

The post Our new paper about opioid prescribing patterns in the US appeared first on Mathew Kiang (.com).

Collaboration network from 2010 to 2019

MV Kiang — Sat, 07 Dec 2019 07:19:45 +0000

I have been trying to wrap my head around working with temporal networks — not just simple edge activation that changes over time but also evolving node attributes and nodes that may appear and disappear at random. What better way than to work with a small concrete example I’m already very familiar with?

Here is an update to a post I made, a little over a year ago, about my collaboration network. Each paper or project (blue) is connected to a collaborator (red). The size of the blue node is the cumulative citations that paper has received since publication to the current year (upper left).

Code available at this gist (note that it’s two files).

The post Collaboration network from 2010 to 2019 appeared first on Mathew Kiang (.com).

I presented some joint work at PAA last week. Slides, code, and more here

MV Kiang — Fri, 04 May 2018 13:35:00 +0000

The post I presented some joint work at PAA last week. Slides, code, and more here appeared first on Mathew Kiang (.com).