preface
documentation for the code behind pomudex in case something happens or someone is curious.
I'd write it on the site, but building a markdown engine seemed like a lot.
how to use
/schedule/{group}
Current Settings
at any point to save your changesunlabeled checkbox
for more settingsunlabeled slider 1
is the time filter. options are:upcoming
live or next hourtoday
next 24 hoursthree days
next 72 hoursscheduled
generally the ones marked more than a week out are for freechat or schedulesall
unscheduled waiting rooms and no shows (late for over an hour)
unlabeled slider 2
is the thumbnail slider. on dense view, it's behindunlabeled checkbox 1
. options are:default
120px widemq
320px widehq
480px widesd
640px widemaxres
1280px wide- (I don't know why youtube labels sd as higher quality than hq)
filter=
is a url param for a toggleable positive filter based on the vtuber org structure, comma separated
- you might find it preset on the secret shortcut urls
- you can use multiple tags, egfilter=PRISM+Project,Petalight,Aura
- full list of filter keywords are available on/channels
hide=
is a url param for a non-toggleable negative filter based on the vtuber org structure, comma separated
- it works the same asfilter=
, so you can hide branches/individuals that aren't to your taste- after a week of inactivity, the frequency of checks go from every 10 minutes to every 6 hours
- can be an issue if someone goes on hiatus and doesn't give much notice for their return
/multi
- drag a url onto the honse or use the
unlabeled text box
- if it contains a valid youtube id or twitch url, it will open an embed
- if you refresh, it reloads the selected active videos
- +/- buttons will change the width of embeds
unlabeled checkbox 1
opens more settings- there's a
slider
for custom embed width - a few diagnostic measures for the width and loaded streams
- an open
all
streams button - a few different templates for viewing the loaded streams
- if using mado or dex view, the profile picture search is WIP and will fallback to a picture of a cat
- a
test
button that outputs the current loaded streams to console unlabeled checkbox 2
to toggle spinning profile picturesunlabeled checkbox 3
to make your next click remove the stream- shortcut shift-click
unlabeled checkbox 4
to make your next click open chat- shortcut alt-click
- there's a
id=
is a url param for preloading embeds, comma separated
/channels/{group}
- see what channels are supported under this url {group}
hide=
also works here- with how fast affiliations change, accuracy not guaranteed
/channels/{channel}
- contains unarchived data from May 2024 onwards
unlabeled text box
allows for filteringslider
allows filtering by uploads or streams- there's no pagination, so the page might be large for older channels
tools
hardware
- any device that can run python
- I just let my PC run overnight, but 3am windows updates caused issues
- raspberry pi has been low voltage and reliable after turning off wifi powersaver
- crontab has also been way nicer than
- crontab has also been way nicer than
software
- config file with youtube ids
youtube channel > description > scroll down to share channel > copy channel id
- optional: I used a yaml format to maintain org>gen>member relationships
- back-end: git, python, requests, youtube, twitch
- youtube: default limit is 10k queries/day or about 7 queries/min
/search
is super expensive and not recommended- you can install the cli to make auth/calls easier but I opted for fewer installs and use
requests
- front-end: sveltekit
back-end
server v1
- call
/channels
- grab channel names and thumbnails from channel ids (up to 50)
- videoCount seems useful at first, but it only shows completed uploads, ie not waiting rooms
- call
/playlistItems
- grab video ids of uploads by channel id
'playlistId': channel.replace('UC', 'UU', 1),
- this is the bulk of your quota since you have to call this for every channel
- will throw an error if there are no uploads, eg predebut or terminated
- grab video ids of uploads by channel id
- call
/videos
- grab title and time data from video ids (up to 50)
- for filtering, waiting rooms and premieres have
activeLiveChatId
inliveStreamingDetails
- optional: save the data so you can skip old ids next query
server v2
this is a complicated mess, but the main improvement is less API load by checking fewer youtube channel and video ids, and therefore higher capacity. this is accomplished by saving the previous results and referencing it with each minimal scrape of API data. since the final API call isn't as immediately exportable as v1
, a lot more care has to go into retrieving old data, cleaning up old waiting rooms, adding new data to your files, and generating the list of waiting rooms, all without overwriting the data. by shelling out the extra storage for the vod list, a catalog of all uploads is also possible.
- parse a map of org > gen > member > platform into platform arrays
- pull up previous archive of overall data
- something that lists the last upload timestamp of each channel
- call
/channels
- save any channel name updates
- loop through overall channels and look at last activity
- if none, ie channel wiped or pre-debut, skip it
- if inactive for more than a week, also skip it
- a few times a day, I scan everything, just in case
- for active channels, call
/playlistItems
- just for 1 video. this lightens the load for the
/videos
call later
- just for 1 video. this lightens the load for the
- loop through individual channel archives
- if there's no archive, download the channel via
/playlistItems
and write it somewhere - while the archive is open, add any older waiting rooms to the pile for
/videos
- defined as not private, has a
scheduledStartTime
, noactualEndTime
, and not a past premiere
- defined as not private, has a
- if the 1 video from
/playlistItems
is already in the archive, we're up to date and don't need to look into it further - if not, we have to check if they uploaded more than that 1 video. call
/playlistItems
again for more - for any new videos, add it to the pile for
/videos
- if there's no archive, download the channel via
- call
/videos
on the pile of new uploads and old waiting rooms- if nothing came back, the video is private. make a list of these for tagging
- if there was a new upload after the last noted channel activity, update the last activity timestamp
- if it's a premiere, tag it
- premieres are defined as
duration != P0D
and noactualEndTime
- premieres are defined as
- make a map of channel > videos from these confirmed waiting rooms
- maybe not needed but it's nicely filtered at this point
- for any confirmed waiting rooms, we can add them to the final output file
- loop through the channels again
- for that list of privated videos, update the individual archive so you can skip it next run
- add any new waiting rooms to the individual archive and save it
- grab twitch streams with
/streams
and user thumbnails with/users
- write it to the waiting rooms file and overall channels file, key can be just the
user_login
- write it to the waiting rooms file and overall channels file, key can be just the
database
- write the list of waiting rooms somewhere
- you can save everything to a relational db and filter/sort after the fact
- non-relational db also works well with video ids
- I just push json files to github and have the front-end find it
- pros: free, reliable, file diff, version control
- cons: no security, no queries, dubious practice, 5 minute cache
- future: maybe self-hosted mariaDB
- not needed for current waiting rooms
- maybe needed for storing vod list + community posts
- snapshot 4/30/24
- channels: 337 listed -> 274 active
- waiting rooms: 0 new + 548 old -> 548 active
- archive: 93,455 videos
- 0 min, 2400 max
- 190 median, 277 average
front-end
hosting
host | domain | notes |
---|---|---|
cloudflare | pages.dev | "pomu" was available |
web.app | "Site ID is has to be at least 6 characters" | |
github | github.io | limit 1 page per account, "Username pomu is not available" |
vercel | vercel.app | found this one after, UI is nice |
filtering
for most cases, appending the channel name to the video title and searching that string for the org works. less useful when the org dissolves but the group stays relatively connected, eg Tsunderia or PRISM. one solution is to reference a file that includes org>gen>member mappings. in my case, sveltekit has a server component that calls both the files for the list of waiting rooms and the map of vtuber groups. from the filter=
param, it generates a list of channels that match the org/gen/talent name, and on the page it displays videos where the channel id matches that list.
postscript
warning: yappage ahead
this project came about when I saw a bunch of dead waiting rooms on mado. the web.app domain also looked funny, so I dug around and it turns out there's a ton of resources for hosting websites (for free!!). I could just do it myself -- how hard could it be? I've scraped webpages before.
two months later, it's basically done! it took about two weekends to get the script working for just personal linkdumping and eventually spin up the front-end for initial feedback. then a weekend and a half to optimize it for the current load and structural filtering capability. I wonder if I could have done it faster if I didn't have work and life stuff, but two months of trial and error helped to catch the bulk of issues I think. so many times I pushed a late night fix, only for it to reveal a break somewhere else, or for a totally new input to show up and obliterate my pipeline.
now I think the basic infrastructure is done. there are bells and whistles to add, like nice css and saved profiles and qol buttons and new features, but for now I'm able to deliver a page without dropping half the waiting rooms on the floor. graduations and debuts will also happen, meaning a whole job of maintaining the links, but I think I can manage it as long as I'm into vtubers. if the day ever comes and I'm not able to maintain the page any more, I leave the blueprints here for another anon to find and maybe improve. until then, my fairy's in her minecraft, all's right with the world.
FAQ
Q | A |
---|---|
will you share the code? | the repo will probably stay private. there's exposed keys and general opsec issues with the repo, and on top of that, swapping to a new repo would require releasing the pomu domain. |
my oshi isn't listed | submit here! general feedback also accepted |