简单用 checkee.info 的数据画了画最近两年美签 check 时长的趋势

美签经常被 check 的朋友想必很熟悉 https://www.checkee.info (一个报 check DP 的网站) :troll:

最近在纠结要不要回国签证,看到不少讨论说十月政府关门导致 check 时间大幅上涨,但 checkee.info 自带的分析不太容易看 week-over-week, month-over-month 的趋势,遂简单 vibe 了几个 script 扒拉下数据画了点图。

本帖使用的一些 definition/assumption:

  • Check date 是指面签时间,Complete date 是指 check 完成的时间,Waiting time 指两者的 diff(都同 checkee.info 的定义),还在 pending 的 case 没有 complete date 故不体现在 waiting time 的折线图里
  • 舍弃 check 时长(waiting time)小于 7d 的数据,个人猜测大都是直接 approve 后正常的处理时间
  • x 轴是 complete date 的图都只考虑 completed case 不考虑 pending case
  • 数据范围是 2024-01-01 到 2025-12-02,数据更新至 2025-12-02
  • 在此范围内一共有 3800 条数据,其中 H1 约 1100 条,F1/J1 加起来约 1800 条
  • 并非所有人都会把自己的 DP 贴到这个网站上,因此以下的分析只能代表 checkee.info 网站上的 DP。本站也有 H1B 面签的 DP 帖 开一个新政后H1B面签的dp楼

7 天趋势 (week-over-week)

所有签证类型


H1


F1 & J1


30 天趋势 (month-over-month)

所有签证类型


H1


F1 & J1


个人观察

  • 观察近期趋势的话单独某个签证类型的数据量太小,可能看所有签证类型的比较有意义
  • 从 complete date 的图来看,最近 check 完成的 case (2025/11 中旬以后) check 时长确实有一个不小的上升,似乎有下降趋势但不知接下来的圣诞假期是否会再度上涨
  • 从 check date 的图来看 2025/09 的 case 并没有见到太过异常的等待时间,可能跟还有不少 pending case 有关
  • 若是年底回去签证建议留足时间 (还是祝福潭友能秒过),不知道一月底可能的二次关门是否会再度影响

Appendix

本帖思路就是把 case details (e.g. Check Reporter) 爬下来放到 csv 文件里,做一下 filter 并 plot 出来。爬的脚本和数据不确定是否有版权或隐私问题就不贴了,让 LLM vibe 一个应该花不了几分钟,留给读者作为练习 :troll:。用于 visualize 的脚本贴在此处供参考,如果潭友还需要画别的图又懒得自己扒数据欢迎评论我尽量画一画

visualize.py
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import numpy as np
import os
import re

# --- Configuration ---
# Visa Groups to analyze
VISA_GROUPS_TO_ANALYZE = [
    {'H1'},          # Only H1
    {'F1', 'J1'},
    'ALL'            # All visa types
]

OUTPUT_DIR = 'plots'
MIN_WAITING_DAYS = 7        # Filter out complete cases with fewer than this many days
START_DATE = '2024-01-01'   # Discard records with dates before this
END_DATE = '2026-01-01'     # Discard records with dates after this
INTERVAL_DAYS = 30           # Grouping interval in days (e.g., 7 for weekly, 30 for monthly)

def load_and_clean_data(filepath):
    """
    Loads data from Excel (.xlsx) or CSV. 
    Prioritizes Excel reading based on user instruction.
    """
    if not os.path.exists(filepath):
        print(f"Error: File '{filepath}' not found.")
        return None

    df = None
    
    try:
        # Check extension to decide loader, but default to Excel if ambiguous or xlsx
        if filepath.endswith('.xlsx'):
            print(f"Attempting to read '{filepath}' as Excel file...")
            df = pd.read_excel(filepath, engine='openpyxl')
        else:
            # Fallback for CSV
            print(f"Attempting to read '{filepath}' as CSV...")
            for encoding in ['utf-8', 'gb18030', 'latin1']:
                try:
                    df = pd.read_csv(filepath, encoding=encoding)
                    break
                except:
                    continue
    except Exception as e:
        print(f"Error loading file: {e}")
        return None

    if df is None:
        print("Failed to load data into DataFrame.")
        return None

    # --- Data Cleaning ---
    df.columns = df.columns.str.strip()
    
    required_cols = ['Check Date', 'Complete Date', 'Visa Type']
    missing_cols = [c for c in required_cols if c not in df.columns]
    if missing_cols:
        print(f"Error: Missing required columns: {missing_cols}")
        return None

    df['Check Date'] = df['Check Date'].astype(str).str.strip()
    df['Complete Date'] = df['Complete Date'].astype(str).str.strip()

    # Determine Status
    df['Case Status'] = 'Complete'
    df.loc[df['Complete Date'].str.contains('0000-00-00', na=False), 'Case Status'] = 'Pending'
    
    # Parse Dates
    df['Check Date Obj'] = pd.to_datetime(df['Check Date'], errors='coerce')
    df['Complete Date Obj'] = pd.to_datetime(df['Complete Date'], errors='coerce')

    # Drop invalid check dates
    df = df.dropna(subset=['Check Date Obj'])

    # --- 1. Date Filtering (Start & End Date) ---
    # Discard if Check Date OR Complete Date is outside the [START_DATE, END_DATE] range
    start_dt = pd.to_datetime(START_DATE)
    end_dt = pd.to_datetime(END_DATE)
    
    # Condition: Check Date must be within range
    check_date_valid = (df['Check Date Obj'] >= start_dt) & (df['Check Date Obj'] <= end_dt)
    
    # Condition: If complete, Complete Date must be within range
    # (Pending cases have NaT/invalid Complete Date, so we ignore them in this specific check)
    complete_date_valid = (df['Complete Date Obj'] >= start_dt) & (df['Complete Date Obj'] <= end_dt)
    
    # Logic: Keep if (Valid Check Date) AND (If Complete -> Valid Complete Date)
    # Keep = (Check in Range) AND (Status==Pending OR Complete in Range)
    final_date_mask = check_date_valid & ((df['Case Status'] == 'Pending') | complete_date_valid)
    
    before_len = len(df)
    df = df[final_date_mask].copy()
    dropped_dates = before_len - len(df)
    if dropped_dates > 0:
        print(f"Dropped {dropped_dates} records outside date range {START_DATE} to {END_DATE}.")

    # Calculate Waiting Days (Complete cases only)
    df['Waiting Days'] = (df['Complete Date Obj'] - df['Check Date Obj']).dt.days

    # Filter out negative waiting days
    df.loc[(df['Case Status'] == 'Complete') & (df['Waiting Days'] < 0), 'Waiting Days'] = np.nan

    # --- 2. Filter Short Waits ---
    short_wait_mask = (df['Case Status'] == 'Complete') & (df['Waiting Days'] < MIN_WAITING_DAYS)
    dropped_short_waits = short_wait_mask.sum()
    
    if dropped_short_waits > 0:
        print(f"Dropped {dropped_short_waits} complete cases with waiting days < {MIN_WAITING_DAYS}.")
        df = df[~short_wait_mask]

    return df

def sanitize_filename(name):
    s = str(name).strip().replace(' ', '_').replace(',', '_')
    return re.sub(r'(?u)[^-\w.]', '', s)

def plot_combined_graph(df, date_col_name, group_name, visa_label):
    """
    Generates a combined dual-axis graph:
    - Bar (Left Axis): Count of Pending vs Complete cases
    - Line (Right Axis): Average Waiting Days
    Groups by INTERVAL_DAYS.
    """
    # Create a local copy to manipulate index
    temp_df = df.copy()
    
    # Drop NaT in the grouping column so resampling works
    temp_df = temp_df.dropna(subset=[date_col_name])
    
    if temp_df.empty:
        return

    # Set index to date column for resampling
    temp_df.set_index(date_col_name, inplace=True)
    
    # Define Resampler
    grouper = temp_df.groupby([pd.Grouper(freq=f'{INTERVAL_DAYS}D'), 'Case Status'])
    
    # Aggregation
    # We want count of rows for bar, and mean of waiting days for line
    agg_df = grouper.agg(
        Count=('Visa Type', 'size'),
        AvgWait=('Waiting Days', 'mean')
    ).unstack(fill_value=0) # Unstack 'Case Status' to columns

    # agg_df structure: MultiIndex columns: (Count, Complete), (Count, Pending), (AvgWait, Complete), ...
    
    # Flatten columns for easier access
    # e.g. Count_Complete, Count_Pending, AvgWait_Complete
    agg_df.columns = ['_'.join(col).strip() for col in agg_df.columns.values]
    
    # Ensure columns exist even if 0
    for col in ['Count_Complete', 'Count_Pending', 'AvgWait_Complete']:
        if col not in agg_df.columns:
            agg_df[col] = 0

    # If grouping by Complete Date, 'Pending' doesn't technically exist on that timeline 
    # (since they have no complete date). So Count_Pending will likely be 0 or irrelevant.
    if 'Complete' in group_name:
        bars_to_plot = ['Count_Complete']
        bar_colors = ['green']
        bar_labels = ['Complete']
    else:
        bars_to_plot = ['Count_Complete', 'Count_Pending']
        bar_colors = ['green', 'orange']
        bar_labels = ['Complete', 'Pending']

    # --- Plotting ---
    fig, ax1 = plt.subplots(figsize=(14, 7))
    
    # Plot Bars (Case Counts) on Left Axis
    # Align='center', width needs to be slightly less than interval to show gaps
    width_days = max(1, INTERVAL_DAYS * 0.8)
    
    # Pandas plot handles dates on x-axis well
    ax1.bar(agg_df.index, agg_df[bars_to_plot[0]], width=width_days, label=bar_labels[0], color=bar_colors[0], alpha=0.6)
    
    if len(bars_to_plot) > 1:
        # Stack the second bar
        ax1.bar(agg_df.index, agg_df[bars_to_plot[1]], width=width_days, bottom=agg_df[bars_to_plot[0]], 
                label=bar_labels[1], color=bar_colors[1], alpha=0.6)

    # Bar Annotations (Counts)
    # Calculate totals for top-of-bar annotation
    totals = agg_df[bars_to_plot].sum(axis=1)
    for date, total in totals.items():
        if total > 0:
            ax1.annotate(f'{int(total)}', 
                         (date, total), 
                         xytext=(0, 5), textcoords='offset points', 
                         ha='center', fontsize=8, color='black')

    ax1.set_xlabel(f'{group_name} (Interval: {INTERVAL_DAYS} Days)')
    ax1.set_ylabel('Number of Cases (Bars)', color='black')
    ax1.tick_params(axis='y', labelcolor='black')
    ax1.legend(loc='upper left')

    # Plot Line (Wait Time) on Right Axis
    ax2 = ax1.twinx()
    
    # Filter out 0 wait times (which happen if no completed cases in that bin) for cleaner line
    line_data = agg_df['AvgWait_Complete'].replace(0, np.nan)
    
    ax2.plot(agg_df.index, line_data, color='blue', marker='o', linewidth=2, label='Avg Waiting Days')
    
    # Line Annotations
    for date, val in line_data.items():
        if pd.notna(val):
            ax2.annotate(f'{val:.1f}', 
                         (date, val), 
                         xytext=(0, -15), textcoords='offset points', 
                         ha='center', fontsize=9, color='blue', weight='bold')

    ax2.set_ylabel('Avg Waiting Days (Line)', color='blue')
    ax2.tick_params(axis='y', labelcolor='blue')
    ax2.legend(loc='upper right')
    
    plt.title(f'Cases & Waiting Time by {group_name}\n({visa_label}, >{START_DATE}, Interval={INTERVAL_DAYS}d)')
    
    # Format X-Axis Date
    ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
    
    # Logic to maximize label density without overlapping too much
    # For the default 6-year range, MonthLocator(interval=1) gives ~72 labels which fits nicely with 90deg rotation
    # We use a safety check for very long ranges
    data_span_days = (agg_df.index.max() - agg_df.index.min()).days
    if data_span_days < 3000: # Approx 8 years
        ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
    else:
        # Fallback for extremely long periods
        ax1.xaxis.set_major_locator(mdates.YearLocator())
        
    # Rotate 90 degrees to fit dense labels
    plt.setp(ax1.xaxis.get_majorticklabels(), rotation=90, ha='center', fontsize=8)
    
    plt.grid(True, linestyle='--', alpha=0.3)
    plt.tight_layout()

    # Save
    safe_label = sanitize_filename(visa_label)
    safe_group = sanitize_filename(group_name)
    filename = f"combined_{safe_label}_{safe_group}_{INTERVAL_DAYS}d.png"
    filepath = os.path.join(OUTPUT_DIR, filename)
    
    plt.savefig(filepath)
    plt.close()
    print(f"  Saved plot: {filepath}")

def filter_data_by_visa(df, visa_group):
    if visa_group == 'ALL' or visa_group is None:
        return df, "All Visa Types"
    
    df['Visa Type Clean'] = df['Visa Type'].astype(str).str.strip()
    mask = df['Visa Type Clean'].isin(visa_group)
    filtered_df = df[mask].copy()
    label = ", ".join(visa_group)
    return filtered_df, label

def main():
    file_path = 'checkee_data.xlsx'
    
    if not os.path.exists(OUTPUT_DIR):
        os.makedirs(OUTPUT_DIR)
        print(f"Created output directory: {OUTPUT_DIR}")
    
    print(f"Looking for file: {file_path}")
    df = load_and_clean_data(file_path)
    
    if df is None:
        fallback_path = 'checkee_data.xlsx - Sheet1.csv'
        if os.path.exists(fallback_path) and not os.path.exists(file_path):
             print(f"XLSX not found, trying fallback CSV: {fallback_path}")
             df = load_and_clean_data(fallback_path)

    if df is None:
        return

    print(f"Total valid records loaded: {len(df)}")
    print("-" * 30)

    for group in VISA_GROUPS_TO_ANALYZE:
        current_df, label = filter_data_by_visa(df, group)
        
        print(f"Processing Group: {label} (Records: {len(current_df)})")
        
        if len(current_df) == 0:
            print(f"  No data for {label}, skipping.")
            continue

        # 1. Combined Graph by Check Date
        plot_combined_graph(current_df, 'Check Date Obj', 'Check Date', label)

        # 2. Combined Graph by Complete Date
        plot_combined_graph(current_df, 'Complete Date Obj', 'Complete Date', label)
        
        print("-" * 30)

if __name__ == "__main__":
    main()```

16 个赞

感谢楼主!

老伯老师能带我婚绿吗 :mobaidalao:

前排围观,感谢:folded_hands:

最新dp有很多假check/social check

check低于3周的要排除

去年8月开始有一些很极端的100天check

xhs群现在有不少等了3个月还在等的

1 个赞

感谢分享,确实这种长 check 是我比较担心的,看起来近期 clear 的 case 里有不少这种情况把数值拉高了

但 pending case 分析不太好做,只能各种论坛/群组看 DP,很多人 clear 后并不会更新 checkee.info (上面图里 2024 年初还有不少显示 pending 的)

这个就见仁见智了,个人是倾向于把不送 DC 的馆内 check 也算在里面,毕竟都没有当场 approve,因此主楼只舍弃了小于 7 天的 case。把 threshold 拉高只看"真" check 也有道理

看起来有点noisy,有没有考虑把每周数据聚合以后画violin plot

感觉样本量太小了

老伯老师能带我混绿吗

最近也在研究能不能把checkee的数据给visualise。版主这个template很好

之前纠结要不要12/10回去签证,然后看到case差不多是60天,改到了02/07,现在看来越来越慢了,到了80天 :melting_face:

目前看是11月底-12月中明显加快了一些,12月底到1月中动静非常慢,可能02/07的appointment又要黄 :cry:

Checkee的数据有很多的bias,有的人是issue才写绿色需要一个个找,也有一些是假check。目前出现了一种很神奇的现象,不少人在check一个月后refused的状态会更新一次,然后就不动了直到3个月。感觉是其中的一个SAO backlog非常严重或者就是不干活

之前政府关闭停了一个多月,那段时间所有DC check都没办法clear

对,事实是如此。但对于要不要办visa这个问题来说,是这些backlog会一直存在还是会像今年夏天8-9月的时候暑假期间的产生的大量backlog被逐步清除 :yaoming:

目前來看

  1. 10-12月有大量回国签证被check的case,说明处于寒假高峰期。
  2. 12月中到一月中由于假期速度明显变慢,从本来因为47天shutdown导致的20天变60天的backlog变成了当下80天的backlog。
  3. 12月中H1B突然加了社媒check,可能会增加local staff的工作量,导致clearance received不等于ready to drop passport
  4. 1月份的申请量也许会下降,2月份农历新年内地处理速度会变慢但不影响DC check。

结论是如果没有做好2个月不在的准备,最好别出去签。如果真的一定要签,那等到3-4月会更好

祝好运呀~希望早日clear

不戳。如果数据点带error bar就更好了。

据说最近上海广州h1b面签availability骤降,未来三个月都很难找到slot. 不知道是不是领馆因为backlog刻意减少新case

上海是因为使馆在装修 available slot 少了很多

hmm ok 不过广州最近也只能约到5月的,不知为何

国内很多中介在那抢 :yaoming:

最近check都是两个月起步,啥时候回归30天左右了麻烦好心人at我一下我就去约面试 :yaoming:

中介从来都有,slot难找最近特别严重。而且仅限h签。b签似乎还比较正常

1 个赞