1. Introduction
As a globally circulated film crossing cultural boundaries, Brokeback Mountain challenges both stereotypical male portrayals in traditional Western cinema and dominant masculinity norms rooted in heteronormativity and emotional restraint. Within global media flows, the film is constantly re-encoded and reinterpreted, generating diverse gender discourse across cultural communities and highlighting the fluid and contested nature of gender identity [1]. While previous studies have examined its impact on homosexuality, affective aesthetics, and social acceptance, there is a lack of systematic analysis of how masculinity is negotiated and how frontier myths are deconstructed in transnational contexts [2]. Particularly, traditional gender studies often rely on small-sample or qualitative methods and have not effectively utilized AI technologies for large-scale multimodal analysis. This study, guided by gender theory and supported by natural language processing and multimodal sentiment models, builds a cross-cultural comment corpus to explore the mechanisms of masculinity negotiation and the cultural variation in discursive reconstructions of masculinity across linguistic and cultural boundaries.
2. Literature review
2.1. Masculinity and the frontier myth
Western films have long constructed an ideal male figure defined by heterosexuality, solitude, toughness, and restraint, a hegemonic masculinity shaped by history and media rather than cultural inevitability [3]. According to Connell's theory, dominant masculine norms suppress alternative male expressions, especially non-heterosexual or emotional identities in public spaces. As a symbol of American national narratives, the Western genre reinforces gender structures alongside territorial conquest. Although set in the West, Brokeback Mountain subverts this tradition by focusing on the protagonists’ emotional struggle and intimacy under social repression, thereby internally dissolving hegemonic masculinity [4]. This narrative strategy challenges the gender logic of frontier heroes and initiates a redefinition of masculinity across cultures in the process of global transmission.
2.2. Gender reconstruction in transnational discourse
In transnational circulation of culture dynamics on a decentralized level, cultural texts are open to shifts and re-encoding of meaning in transnational transmission, and gender discourse falls under diverse understandings. As an example of transnational communication, Brokeback Mountain has been viewed in Western environments as a challenge to patriarchal and heterosexual standards and in East Asia as related to family morality and moral responsibility, such as an illustration of gender identity symbolic drift [5]. Transnational communication theory has a tendency to draw attention to the conception of meaning tending to be unfixed and created by interplay among audiences and their cultural environments. Viewers project gendered sequences under value-laden and emotional understandings from their own environments. This negotiating force manifests culture's regulatory power and contest among regional standards and universal gender equality discourse [6]. For this reason, gender discourse analysis requires comprehension of divergent understanding and symbolic recasting across cultural environments in order to comprehend nuanced dynamics of gender under transnational settings.
2.3. Computational approaches in gender studies
Conventional gender research approaches of close reading and qualitative analysis provide in-depth understanding of meaning and culture but are not scalable to accommodate large-scale and cross-modal data. With the advent of AI, new approaches to gender studies have been introduced. Natural language processing, sentiment analysis, and visual recognition are now used more and more to derive emotional cues, semantic tags, and symbolic structures [7]. Multilinguistic context and sentiment polarity are accommodated by language models such as Bidirectional Encoder Representations from Transformers (BERT) and Text Convolutional Neural Network (TextCNN), and gendered meaning in composition, proximity, and face are accounted for by image models such as Vision Transformer (ViT). These approaches improve efficiencies and accommodate the combination of qualitative and quantitative analysis but pose concerns about model bias and cultural misinterpretation [8]. So, bringing AI together with gender theory needs an interplay of technical rigor and theoretical foundation to provide a techno-cultural, multi-disciplinary research paradigm with humanistic sensitivity.
3. Methodology
3.1. Dataset construction and preprocessing
The data sources for the study include IMDb, Reddit and Douban, and the timeframe covers the release of Brokeback Mountain in 2005 to the end of 2024, in order to ensure that the data have sufficient horizontal cross-cultural representativeness and vertical time depth, as shown in Table 1. For the raw data, noise filtering and language standardization are performed. For text, regular cleansing and a lexer (spaCy for English, jieba for Chinese) are used to complete syntactic segmentation and deactivation filtering, and for images, OpenCV is used for image edge enhancement and uniform size scaling, which lays the foundation for the subsequent modeling of a uniform input format [9].
Platform |
Data Type |
Text Samples |
Image Samples |
Language |
Crawling Method |
IMDb |
Film reviews |
22,306 |
560 |
English |
API and scraping |
|
Threads and posts |
18,412 |
370 |
English |
Pushshift API |
Douban |
Reviews and posts |
20,089 |
640 |
Chinese |
requests and XPath |
3.2. Multimodal sentiment recognition model
After completing data normalization a multimodal sentiment recognition model is constructed to address the complex emotional and gendered expressions embedded in user comments on Brokeback Mountain across different platforms [10].
The text channel is based on BERT with TextCNN to extract the sentiment features in the comments, in the text processing part, the comment text is represented as an input sequence T={t1,t2,... ,tn}, BERT outputs the context embedding vector E = {e1,e2,... ,en}, followed by entering TextCNN for local emotion feature extraction see Equation (1).
This allows the model to detect emotional tone linked to masculinity repression identity anxiety or heteronormative resistance embedded in text.
Image Channel then uses Visual Transformer to identify gendered visual elements in user-uploaded stills or accompanying images, such as body distance, expression masking, and character composition tension. The user's accompanying image is divided into several visual patches, and the contextual structure in the image is modeled by ViT's multi-attention mechanism to output the gendered image semantic vectors see Equation (2).
The text vector hhh and the image vector Z are fused into the final representation f=[h∥Z], which is inputted into the MLP for the classification of sentiment polarity (positive, neutral, negative) and intensity prediction.
The model is optimized by the joint loss function, which significantly improves the recognition of fuzzy emotions in the category of “repressed but rational” and “intimate but conflicting”, and provides a culturally sensitive basis for the expression of emotions for subsequent gender semantic modeling.
3.3. Gender-semantic labeling and discourse classification
After extracting affective features, a gender semantic labeling system was developed to identify key discursive elements related to masculinity negotiation in the review texts. In this study, gender is conceptualized as a continuous process of construction through emotional projection and symbolic expression within specific cultural contexts [11]. Based on theoretical assumptions and corpus analysis, 12 gender-related label categories were defined, including “emotional repression,” “social and moral judgment,” “same-sex identity ambiguity,” “family responsibility dilemma,” and “gender identity.” A multi-label classification method was employed for automated recognition.
To model the multi-faceted nature of gender discourse the task is formulated as a multilabel classification problem using the fused vector f from the previous module. The probability output for label assignment is optimized through a sigmoid-based multilabel loss see Equation (3).
Where each yi represents whether a given comment reflects a particular gender discourse theme. This allows the model to recognize that a single comment may express conflicting attitudes such as empathy with repressed masculinity while also upholding traditional family expectations.
To uncover latent thematic structures beyond predefined categories Latent Dirichlet Allocation (LDA) is introduced to analyze the distributional patterns of semantic tokens within comment corpora. Given document set D={d1,d2,...,dn} topic-word and document-topic distributions are estimated by maximizing joint likelihood see Equation (4).
This model helps reveal emergent discourse clusters such as West-as-freedom metaphor shame under cultural surveillance or romantic resistance to heteronormativity. These unsupervised themes are then cross-referenced with the manually curated label system to enhance interpretive precision.
4. Results
4.1. Sentiment polarity and gender theme distribution
Through the multimodal sentiment recognition model applied to user comments across the three platforms, significant cross-cultural patterns of emotional distribution are identified. As shown in Figure 1, comments in English-language contexts display a marked tendency toward negative sentiment. On IMDb and Reddit, negative sentiment accounts for 52.8%, while positive sentiment comprises only 31.4% and neutral sentiment 15.8%. This phenomenon is closely related to Western users’ deep emotional resonance with the male characters’ identity dilemmas in Brokeback Mountain. Expressions such as "repression" "anguish" and "societal rejection" frequently appear in comments, reflecting a strong critical awareness of the constraints imposed by traditional masculinity. In contrast, the Chinese-language platform Douban exhibits a distinctly different pattern. The proportion of positive sentiment rises significantly to 45.3%, with neutral sentiment at 35.7% and negative sentiment reduced to 19.0%. This divergence stems from Chinese users’ inclination to interpret character behavior through an ethical-moral lens. Terms like "family responsibility" "moral conflict" and "social acceptance" become central themes in the commentary. Semantic network clustering analysis further validates this cultural divergence. In English-language contexts, "masculinity-shame-freedom" forms the dominant discourse cluster, indicating a focus on individual liberation, while in Chinese-language contexts, "family-duty-acceptance" constitutes a frequent co-occurrence pattern, reflecting the influence of collectivist culture in regulating gender expression.
This cross-cultural emotional polarization reveals the mechanism of localized reconstruction in the global circulation of gender discourse. It indicates that the pathways to understanding masculinity are culturally specific, with Western contexts emphasizing individual emotional emancipation and Eastern contexts stressing social ethical balance.

4.2. Multimodal cross-validation and model accuracy
Using five-fold cross-validation, systematic performance evaluations are conducted on four different model configurations. As shown in Table 2, the late fusion model achieves the best performance across all evaluation metrics, with an accuracy of 89.4%, an F1 score of 0.87, and a standard deviation of only ±1.2, demonstrating excellent stability and generalization capability. In contrast, unimodal models exhibit clear limitations. Although the TextCNN text-only model performs well in recognizing emotional vocabulary with an accuracy of 82.1%, it tends to misclassify when dealing with metaphorical expressions and cultural references. The image-only ViT model achieves an accuracy of only 75.6%, mainly due to its difficulty in interpreting the emotional orientation of visual symbols when lacking contextual information.
Model Configuration |
Accuracy (%) |
Precision |
Recall |
F1 Score |
Cross-Validation Std Dev |
Training Time (minutes) |
TextCNN (Text-only) |
82.1 |
0.79 |
0.81 |
0.80 |
±2.3 |
24.5 |
ViT (Image-only) |
75.6 |
0.72 |
0.76 |
0.74 |
±3.1 |
31.2 |
Early Fusion Model |
85.3 |
0.83 |
0.84 |
0.83 |
±1.8 |
42.7 |
Late Fusion Model |
89.4 |
0.88 |
0.86 |
0.87 |
±1.2 |
38.9 |
Further analysis reveals that the late fusion strategy more effectively handles semantic inconsistencies between modalities, particularly excelling in identifying complex semantic categories such as "implicit emotional expression" and "conflicted gender cognition". Performance decomposition at the label level shows that directly gendered labels such as "homosexual identity" "emotional repression" and "patriarchal conflict" are recognized with relatively high accuracy, whereas abstract labels like "moral dilemma" and "substitutive roles" yield lower classification results. This is primarily attributed to their blurred semantic boundaries and cultural specificity. Overall, the multimodal fusion approach significantly improves the precision and robustness of automatic gender discourse identification.
5. Discussion
This research unravels culturally negotiating masculinity and deconstructing frontier myths via user-generated affective articulation and gendered semiosis across transnational media environments. In the West, individualistic narratives enable deconstruction of male identity's traditional repression of emotion, with users more likely to display empathy and challenge toward identity conundrums. Conversely, the East tends to situate gender articulation within familial obligation and moral duty expectations, with users judging characters via normative ethical perspectives. These cross-cultural distinctions prove gender discourse an interwoven product of language, iconology, affect, and culturally held assumptions rather than a product of single-textual hermeneutics. This research also supports AI-driven approaches to register semantic indeterminacy and affective complexity across gender narratives, validating methodological strengths and weaknesses of such approaches in cultural discourse analysis and contributing theoretically to future multimodal and intercultural gender studies.
6. Conclusion
This research adopts Brokeback Mountain as a case to create a cross-context multimodal corpus and combines natural language processing and picture recognition methods to embark on a systematic examination of the reconstructing of gender discourse in global circulation of media. The results reveal that masculinity has profound cultural adaptability, and its semantic form is constantly reforming owing to emotional expression and localized cultural infusion, defying gender as an invariant cultural signifier. Additionally, the emotion-semantic recognition model introduced in this work extends the capability of detecting metaphorical, ambivalent, and affectively nuanced gender discourse, exemplifying the practicability of artificial intelligence in genderology. For future work, this framework may be further extended to scenarios of multilingual and multicultural settings and investigate heterogeneous patterns of gender expression and foster the spread of gender equality through technological and theoretical innovations alike.